Defensive Publications Series

Dual Edge-Server Language Model Orchestration Framework for Low-Latency Response Generation and Passively Improved Stored Context

Abstract

Conversational interfaces enable users to pose queries to and receive answers from a large language model (LLM). With server-hosted LLMs, there can be substantial latency between query completion and receiving a response due to network delays and server computation time. This disclosure describes the provision of a distilled smaller model on an edge device to reduce latency through on-device response generation on the edge device. Owing to its smaller scope and size, the responses from the model may not be as high quality as that obtained via an LLM. The query and the locally generated response are provided to a server LLM that is instruction tuned to correct the assumptions in the responses generated by the on-device model. The higher-accuracy response from the LLM can be used as context information by the on-device LLM when generating subsequent responses. The background interaction can be repeated for subsequent queries producing high quality responses with low latency.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

n/a, "Dual Edge-Server Language Model Orchestration Framework for Low-Latency Response Generation and Passively Improved Stored Context", Technical Disclosure Commons, (March 04, 2024)
https://www.tdcommons.org/dpubs_series/6740

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Dual Edge-Server Language Model Orchestration Framework for Low-Latency Response Generation and Passively Improved Stored Context

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Dual Edge-Server Language Model Orchestration Framework for Low-Latency Response Generation and Passively Improved Stored Context

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information