Abstract

Organizations in regulated or disconnected environments may face challenges in deploying high-performance generative models due to data residency or connectivity constraints, while local models can exhibit a performance gap for certain complex tasks. A hybrid architecture can augment a local student generative model using a dynamic distillation cache. This system can capture outputs, such as final answers, and underlying reasoning patterns from a remote teacher model. When a new query is received, the system can use semantic similarity to retrieve relevant cached reasoning and provide it as context to the local student model. This method of in-context distillation may allow the local model's performance to improve based on live usage. This approach can help to bridge capability gaps while supporting operational resilience and data sovereignty requirements.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

McCormack, Ben; Rodriguez, Pablo; Breitman, Karin; Chea, Sol; and Keren, Orna Berry, "Dynamic Distillation Cache for Augmenting a Local Generative Model with Teacher Model Reasoning", Technical Disclosure Commons, (January 29, 2026)
https://www.tdcommons.org/dpubs_series/9235

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Dynamic Distillation Cache for Augmenting a Local Generative Model with Teacher Model Reasoning

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Dynamic Distillation Cache for Augmenting a Local Generative Model with Teacher Model Reasoning

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information