Different large language models (LLMs) specialized to domains such as writing code, engaging in conversations, generating content, etc. are available. A specialized LLM can only reliably answer questions in domains over which it has been trained. Large numbers of types of specialized LLMs can make it difficult for a user, such as an application that generates LLM queries, to choose the right type of LLM. This disclosure describes techniques to automatically route query payloads between large language models specialized for different domains. The techniques utilize a vector database to semantically match an LLM to a user query. The techniques also provide a real-time feedback and adaptation mechanism. Security checks and access controls are applied in a centralized manner while adhering to security compliance regimes. The techniques provide improved end-to-end security posture of AI-based applications and user experience. The techniques can also reduce the costs of querying large LLMs.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Namer, Assaf; Diaz, Hector; Miller, Jim; Maltzman, Brandon; and Vagts, Hauke, "AI-based Adaptive Load Balancer for Secure Access to Large Language Models", Technical Disclosure Commons, (October 31, 2023)