Abstract
Traditional large language models (LLMs) are single, massive models that are expensive and time-consuming to train. A mixture of experts (MoE) architecture attempts to ameliorate the relative inflexibility of traditional LLMs by activating only a subset of the model parameters (known as experts) during each inference step. However, even MoEs lack modularity, as their experts are trained together and generally do not collaborate. This disclosure describes an LLM architecture and techniques that utilize multiple, complete, and independently pre-trained LLMs as modular macro-experts. The architecture, referred to as a heterogeneous macro mixture of experts (macro-MoE), includes a trainable gating network that dynamically routes input prompts to the most appropriate expert or sequence of experts. A dual-function orchestrator synthesizes parallel outputs for simple tasks and manages a collaborative, multi-step generation process for complex tasks by routing intermediate results between different experts. The techniques enable a highly modular, computationally efficient LLM capable of solving complex problems by leveraging the specialized strengths of diverse, state-of-the-art models in a cohesive, integrated framework.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
NA, "Modular Language Model Architecture with Collaborative Routing Between Heterogeneous Experts", Technical Disclosure Commons, (August 18, 2025)
https://www.tdcommons.org/dpubs_series/8468