Defensive Publications Series

Per-User KV Cache with Active User Windowing and Tiered Memory Placement for Billion-Scale LLM Recommendation

AnonymousFollow

Abstract

Techniques manage per-user attention key-value (KV) caches for large language model (LLM) recommendation serving at very large user scale. A cache manager maintains activity windows to bound the number of users whose KV cache tensors are retained. KV caches for users active within a hot window are stored in GPU high-bandwidth memory at higher precision, while KV caches for users active within a warm window are stored in host RAM in a more compact quantized format. Entries are demoted from the hot tier to the warm tier after a hot inactivity threshold using quantization, promoted back to the hot tier on reuse using dequantization, and evicted to a cold state after a warm inactivity threshold such that later requests recompute the KV cache. An activity-aware eviction policy may use a Poisson return model to prioritize retention. Sticky routing via consistent hashing maps repeat user requests to the same serving instance to improve cache hit rate.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Per-User KV Cache with Active User Windowing and Tiered Memory Placement for Billion-Scale LLM Recommendation", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10713

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Per-User KV Cache with Active User Windowing and Tiered Memory Placement for Billion-Scale LLM Recommendation

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Per-User KV Cache with Active User Windowing and Tiered Memory Placement for Billion-Scale LLM Recommendation

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information