Inventor(s)

Abstract

Techniques are disclosed for closed-loop dynamic sizing of a multi-level embedding cache hierarchy. Runtime hit-rate statistics are monitored for multiple tiers including a GPU HBM cache, a host DRAM block cache for an SSD-backed database, and an SSD read-ahead buffer. A marginal-benefit estimator determines, for each tier, an estimated reduction in expected access latency per unit of additional memory, using smoothed measurements and optionally combining Zipf-model fitting with finite-difference probing. A feedback controller, including a PID control law, computes bounded tier size adjustments that reallocate a fixed total memory budget to reduce marginal-benefit imbalance across tiers. Phase transitions may be detected from hit-rate rate-of-change and used to reset integral state and temporarily boost controller gains. A safe resize executor applies changes gradually and supports rollback on post-resize regression. The approach reduces expected embedding access latency and improves memory utilization across workload phases.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS