Abstract
Techniques are described for predictive prefetch and cache management in SSD-backed embedding table systems used in iterative machine learning training. Online access tracking maintains per-row exponentially decayed access frequency and last-access information, and computes batch-level autocorrelation using set similarity over a sliding window. An online estimator computes a discrete Zipf exponent via maximum likelihood estimation using Newton updates with warm start and step-size clamping. A lightweight predictor asynchronously computes a prefetch score for embedding rows by combining Zipf-normalized frequency, temporal recency, and batch-correlation signals with weights adapted by multiplicative updates, producing a ranked prefetch list and confidence-weighted priorities. A PID-based controller adjusts prefetch depth based on prefetch hit rate feedback. Cache eviction behavior is selected or blended between frequency-based and recency-based eviction using a sigmoid mixing coefficient derived from the Zipf exponent. Prefetch reads are issued to SSD according to the ranked list and adaptive depth while managing cache insertions and evictions.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Anonymous, "Machine-Learning-Driven Predictive Prefetch for SSD-Backed Embedding Tables Using Access Pattern Modeling", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10662