Every model request, through one control point you own.
TL;DR. The LLM Gateway sits in front of every model your agents and people use. It routes by intent, enforces DLP and policy in code, caps budgets, scores quality with reinforcement learning, and logs every token. Multi-LLM, BYO-key, EU-sovereign or fully on-prem.
What this is about
The moment AI enters an organisation, a question follows: which model saw what, under whose key, at what cost, and can you prove it? Without a control point, the answer is "we don't know" — which is unacceptable in a regulated business. The LLM Gateway is that control point. Every request from every agent, copilot or integration passes through it, governed by policy you can read and audit.
How we run it
The gateway runs in your EU cloud or on-prem. Clients connect with a bearer token; the gateway decides which model handles the request (by intent, cost, availability), applies DLP and policy in code, enforces per-tenant and per-squad budgets, and falls back across models on failure. Reinforcement-learning scoring tracks output quality so routing improves over time. Keys are managed centrally — none scattered in repos or workflows. And every prompt, response, tool call and model choice is logged for audit.
When it fits
Any organisation running AI in production that needs governance: which model, which data, which cost, provable. Finance, healthcare, public sector and KRITIS operators where "we don't know which model generated this" is not an answer. Teams running multiple agents and copilots that need one policy plane over all of them.
What we don't do
We don't sit between you and your model provider as a paid router — BYO-key is standard, you pay the provider directly. We don't persist your prompts beyond the audit log you control. We don't lock model choice to one vendor — the gateway abstracts them.
Your AI rules, versioned and diffable.
DLP, budgets, model whitelists, role gates — all expressed in code, not a console. One source of truth, auditable, and the rule travels with the request.
-
DLP at the boundary
Block PII, secrets and regulated data before they ever reach a model.
-
Budget caps
Per tenant, per squad, per day. Overflow routes to a cheaper fallback, not a surprise bill.
-
Model whitelist
Decide which models are allowed for which workloads. Swap vendors by config.
-
Role gates
Some requests require a role (e.g. DPO approval for PII). Enforced, logged, provable.
# LLM gateway policy (Starlark)
def on_request(req):
if req.contains_pii() and not req.user.has_role("dpo"):
return deny("PII without DPO approval")
if req.tokens > budget.daily_remaining(req.tenant):
return route("fallback-model")
if req.model not in policy.whitelist(req.tenant):
return deny("model not permitted for tenant")
return allow()What you can hand off
-
Gateway deployment
In your EU cloud or on-prem. Clients connect by bearer token; no keys in repos.
-
Policy as code
DLP, budgets, model whitelist and role gates expressed in Starlark — diffable and auditable.
-
Multi-LLM routing & fallback
Intent-based routing across Anthropic, OpenAI, Mistral and locally-run models, with fallback chains.
-
RL quality scoring
Reinforcement-learning scoring of outputs so routing improves and weak paths are flagged.
-
Full audit log
Every prompt, response, tool call and model choice logged — the record an auditor asks for.
-
Clone-ready config
Gateway config and policies as code — yours to take in-house from day 30.
What happens to one model request.
From client call to audited response — every decision the gateway makes is policy-driven and logged.
Governed Request
- Step 1 Client (agent/copilot) calls the gateway with a bearer token.
- Step 2 DLP scan: PII, secrets and regulated data checked against policy.
- Step 3 Budget check against tenant/squad daily remaining; overflow routes to fallback.
- Step 4 Model selected by intent, cost and availability from the tenant whitelist.
- Step 5 Request sent under your key (BYO-key); response scored by the RL layer.
- Step 6 Prompt, response, model choice and policy decisions written to the audit log.
Fallback Chain
- Step 1 Primary model times out or errors.
- Step 2 Gateway retries on the next model in the tenant's fallback chain.
- Step 3 Degradation logged; quality delta tracked for the routing model.
Product facts
| Models | Anthropic, OpenAI, Mistral, locally-run (Llama family, GPT-OSS family) |
|---|---|
| Key model | BYO-key standard · you pay the provider directly |
| Policy | Starlark — DLP, budgets, whitelist, role gates · diffable |
| Routing | Intent-based + RL quality scoring + fallback chains |
| Audit | Every token, response, tool call and model choice logged |
| Deployment | EU cloud, on-premise or air-gapped |
| Persistence | Logs for audit; prompts not retained beyond your log |
| Clone handover | Config and policies as code, from day 30 |
Asked before the briefing
-
Do you sit between us and the model provider?
Only as policy and audit. BYO-key is standard — your request goes to your provider under your key. We don't meter or resell tokens. -
Which models can it route to?
Anthropic, OpenAI, Mistral and locally-run models. The gateway abstracts them, so you swap vendors by config, not by re-integration. -
How is DLP enforced?
In policy code at the boundary. PII, secrets and regulated data are checked before a request reaches any model; violations are denied and logged. -
What exactly is logged?
Every prompt, response, tool call, model choice and policy decision — the audit record regulators ask for. You control retention. -
Can we run it fully offline?
Yes. On-prem and air-gapped deployments run against locally-hosted models with no external connectivity. -
Can we take it in-house?
Yes. Config and policies are code; the clone handover gives your team a running gateway from day 30.
One control point in front of every model.
We deploy the gateway against your stack, write a first policy set, and show the audit log an auditor would actually accept.