Abstract

Production voice-agent platforms typically hardcode their speech-to-text (STT) and text-to-speech (TTS) providers, and they read provider credentials from process environment variables at start-up. As a result, switching providers (or serving two tenants with different providers from a single process) requires a code change or a process restart, and it forces every tenant's secrets to coexist in one shared environment. This disclosure describes a real-time voice pipeline that removes those constraints through a specific combination of mechanisms: (1) each of the three pipeline stages — STT, brain language-model (LLM), and TTS — is resolved at call time by looking up a provider in a registry keyed by a per-persona / per-tenant configuration; (2) the credential for the selected provider is fetched lazily through a configuration bridge at the moment of first use, never eagerly loaded into the process environment, using a memoized loader with a documented environment-variable fall-back for standalone operation; (3) each pipeline instance is stateless and scoped to a single live call — constructed when the call connects and destroyed on hang-up, so no cross-call state leaks; and (4) the pipeline emits three independent latency measurements (STT-ms, brain-LLM-ms, TTS-ms) plus a round-trip total, enabling per-stage SLA attribution across heterogeneous providers. The combination yields multi-tenant voice in which distinct tenants run distinct provider stacks, secrets need not live in shared environment variables, and operators obtain stage-level latency visibility. This document gives the architecture, the call-time resolution and lazy-loading algorithm with pseudocode, the data model, an honest prior-art delta against Twilio Voice Intelligence, Vapi.ai, Bland.ai, Vonage, and NVIDIA Riva, and a clean-room reference implementation — all released as public prior art so the technique remains free to practice.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS