Inventor(s)

Abstract

Techniques are described for prefetching embeddings from SSD-backed storage using a decoupled pipeline. Cache misses are identified by a caller thread and enqueued to a miss queue. A dedicated reader stage dequeues misses, issues asynchronous multi-key storage reads, and forwards completed results individually to a fill queue in completion order rather than request order. A dedicated cache filler stage dequeues fill entries and inserts embedding values into one or more caches, optionally selecting fill work using a priority policy that accounts for urgency (e.g., embeddings awaited by a forward pass), access frequency, and batch position. The reader stage may adapt storage read batch size based on queue depth, storage utilization, and latency signals. Lock-free queues decouple I/O completion handling from cache insertion to reduce head-of-line blocking and enable overlap of storage reads and cache filling.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS