Abstract

Modern AI platforms funnel all inference through a single multi-provider gateway so that provider credentials, model allow-lists, cost telemetry, and policy resolve in one place. The default reliability posture of such a gateway is to fail open: if the requested model/provider is unreachable, the gateway silently substitutes a cheaper or different model so the call still returns. For general chat this is correct — availability beats fidelity. For high-stakes governance decisions (an LLM-as-judge that certifies mastery, scores an assessment, or gates a release) silently re-basing the decision onto an unvalidated, cheaper, or differently-aligned model is a correctness/safety defect, not graceful degradation.

This publication discloses a routing mechanism in which the fail-closed-versus-fail-open decision is bound to the semantic route class, not to the requesting caller. Each request bears a route identifier selecting a stored route descriptor that declares a provider, a default model, an allowed-model list, and a boolean fail-closed flag. Fail-closed routes construct a candidate chain consisting of only the requested provider+model, bypass any local-inference/GPU-queue path, refuse any result resolving to a different provider (provider-identity mismatch), and write an immutable audit record of every denial — terminating rather than substituting. Fail-open routes build an ordered, cross-provider fallback chain read from a database table that itself degrades to an in-code hardcoded array when the database is unreachable — making the routing layer resilient to the failure of its own configuration store. The combination lets governance and general traffic share one provider-agnostic bridge with opposite, route-appropriate failure postures. This document is published defensively to keep the technique freely practiceable and unpatentable by others.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS