Defensive Publications Series

Token-Aware Capacity Management with Multi-Level Budget Hierarchy for LLM-Based Recommendation Systems

AnonymousFollow

Abstract

Token-aware capacity management is described for LLM-based recommendation serving. A request token cost is estimated as a sum of token components including profile, candidate, context, and output tokens. Admission control evaluates the estimated cost against a four-level hierarchical token budget comprising a global budget, per-surface budgets, per-user budgets derived from user tiers, and a per-request cap. For sub-100 ms serving paths, serving instances perform budget checks and deductions using local leased counters obtained asynchronously from a central token pool, enabling constant-time decisions without network round-trips and non-blocking lease renewal. When budgets are insufficient, a graceful degradation orchestrator selects among full LLM ranking, partial LLM ranking on a top-K subset of candidates, or traditional ranking, including surface-priority shedding using a reserved token pool. Capacity planning integrates token throughput sizing alongside compute-based sizing.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Token-Aware Capacity Management with Multi-Level Budget Hierarchy for LLM-Based Recommendation Systems", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10712

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Token-Aware Capacity Management with Multi-Level Budget Hierarchy for LLM-Based Recommendation Systems

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Token-Aware Capacity Management with Multi-Level Budget Hierarchy for LLM-Based Recommendation Systems

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information