Abstract
Techniques are disclosed for closed-loop dynamic sizing of a multi-level embedding cache hierarchy. Runtime hit-rate statistics are monitored for multiple tiers including a GPU HBM cache, a host DRAM block cache for an SSD-backed database, and an SSD read-ahead buffer. A marginal-benefit estimator determines, for each tier, an estimated reduction in expected access latency per unit of additional memory, using smoothed measurements and optionally combining Zipf-model fitting with finite-difference probing. A feedback controller, including a PID control law, computes bounded tier size adjustments that reallocate a fixed total memory budget to reduce marginal-benefit imbalance across tiers. Phase transitions may be detected from hit-rate rate-of-change and used to reset integral state and temporarily boost controller gains. A safe resize executor applies changes gradually and supports rollback on post-resize regression. The approach reduces expected embedding access latency and improves memory utilization across workload phases.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Anonymous, "Defensive Publication: Closed-Loop Feedback-Driven Dynamic Tier Sizing for Multi-Level Embedding Cache Hierarchies", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10663