Inventor(s)

Abstract

Techniques are described for cache observability in machine-learning embedding stores using dual-mode measurement with non-interfering consumers. A monotonic, non-resetting vector of counters tracks total lookups and tier-specific hits. Monitoring derives cumulative hit rates via non-mutating reads of the counters. Profiling derives windowed hit rates by capturing counter snapshots at boundaries and computing deltas, enabling virtual resets and overlapping windows without resetting shared state. The same counters are exposed across multiple software layers (e.g., native code, graph execution, and scripting) using scalar loads and simple arithmetic. A feedback controller maintains a ring buffer of per-batch deltas to compute sliding-window hit rates, detects regime changes by comparing portions of a window, and triggers cache reconfiguration. Conditional activation flags reduce overhead by bypassing counter updates when observability consumers are inactive.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS