Abstract

Server-based large language model (LLM) interfaces that support a large user base need efficient allocation of computational resources is important to deliver responses in a timely manner. FIFO allocation of queries to the LLM can result in unpredictable and/or long median wait times, depending on the arrival rate of queries and query processing time. This disclosure describes techniques that improve resource allocation for a large language model (LLM) by detecting a contextual pause and enabling next query processing during the pause. A transformer design and allocation scheme is presented that retrains the transformer decoder with a contextual pause token that can be fetched at the output layer autoregressively. The contextual pause token can mark and split parts of a large paragraph into chunks that have contextual consistency. The net usage of the token is to dynamically adjust inference prioritization for users that have not received responses at all over the ones who reached an early contextual pause token and can take time to digest the response information. The described techniques can enable lower shorter wait times on average, without degrading the user experience.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Shin, D, "Multi-User Language Model Resource Allocation Using Contextual Pause Token Aware Transformers", Technical Disclosure Commons, (May 08, 2024)
https://www.tdcommons.org/dpubs_series/6981

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Multi-User Language Model Resource Allocation Using Contextual Pause Token Aware Transformers

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Multi-User Language Model Resource Allocation Using Contextual Pause Token Aware Transformers

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information