Abstract

Scaling language models for inference can be difficult due to the computational resource requirement. LLM deployments typically include computational resources for the LLM itself and additional resources that run separate hand-turned or learned models for resource allocation. The overall complexity of resource configuration for server-based LLM deployments includes that for the LLM itself as well as that for the separate models. This disclosure describes the use of a language model to perform its own computational resource management. Per the techniques, resource management metadata is provided as input to the language model as an additional input along with the incoming query. By combining the user query and resource availability metadata into a prompt, the described techniques leverage the predictive power of the LLM to perform computational resource allocation, instead of doing resource allocation via a separate model. The LLM can effectively manage its own resources and there is no need to train or maintain separate models.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Shin, D, "Language Models with Self-Contained Computational Resource Scheduling", Technical Disclosure Commons, (September 27, 2023)
https://www.tdcommons.org/dpubs_series/6286

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Language Models with Self-Contained Computational Resource Scheduling

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Language Models with Self-Contained Computational Resource Scheduling

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information