Abstract
Organizations in regulated or disconnected environments may face challenges in deploying high-performance generative models due to data residency or connectivity constraints, while local models can exhibit a performance gap for certain complex tasks. A hybrid architecture can augment a local student generative model using a dynamic distillation cache. This system can capture outputs, such as final answers, and underlying reasoning patterns from a remote teacher model. When a new query is received, the system can use semantic similarity to retrieve relevant cached reasoning and provide it as context to the local student model. This method of in-context distillation may allow the local model's performance to improve based on live usage. This approach can help to bridge capability gaps while supporting operational resilience and data sovereignty requirements.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
McCormack, Ben; Rodriguez, Pablo; Breitman, Karin; Chea, Sol; and Keren, Orna Berry, "Dynamic Distillation Cache for Augmenting a Local Generative Model with Teacher Model Reasoning", Technical Disclosure Commons, (January 29, 2026)
https://www.tdcommons.org/dpubs_series/9235