Why our LLM Gateway filters before the model

The common assumption

"Let the model answer and filter out what shouldn't go out." Sounds clean — but it's a dead end. If you filter at the output you only see something after the model has burnt expensive token budget. And if the model has processed something sensitive into its answer, it's already in the logs.

What we do instead

Our LLM Gateway filters in three stages:

Before the model — input filter. We check prompts for PII, business secrets, credentials, policy violations. An employee who accidentally pastes a customer name plus bank details into a prompt gets a warning before the API call. The model never sees the event.

In the tool call — action filter. When the model wants to call a tool (read a file, close a ticket, send mail) we check the action against policy. Write operations are human-in-the-loop by default. Read operations are checked against source (which tenant, which class).

After the model — output filter. Only as a safety net, not the primary layer. We check outputs for the same things as inputs — and if something slips through we block and alarm.

What it costs

In practice: 15–25 ms latency per request depending on the depth of input checks. Token budget decreases by 0% (we actually save, because nonsensical requests get caught before they reach the model).

What it has prevented

Within six months of operation across three mandates:

148 times a prompt with PII was caught — before being sent to the model.
3 times a model would have closed a production ticket without confirmation. Human-in-the-loop prevented it.
0 times did a business secret slip through (output filter as safety net).

Most of those 148 cases were not malicious — people paste data into prompts because it's fast. The filters are the reminder the model doesn't have.

Concrete lesson

If you build your own gateway or deploy one, ask:

Does it filter before the model — not just after?
Does the gateway see tool calls, not just chat text?
Is every action audited, including the rejected ones?
Can you maintain filters per tenant / per team / per use-case separately?

If the answer is yes four times, you are ahead of 90% of the stacks out there.