Defensive Publications Series

Pipelined Double-Buffer Eviction Scheduling with GPUDirect Storage for Zero-Copy Tiered Embedding Management

AnonymousFollow

Abstract

Techniques are described for tiered embedding management in which embedding rows are evicted from GPU memory to NVMe SSDs and prefetched from SSDs into GPU memory using GPU-direct storage DMA that bypasses CPU memory. Two page-aligned GPU-resident buffers are registered with a GPU-direct storage interface and are alternated by a double-buffer scheduler such that one buffer serves embedding access for model computation while the other buffer performs I/O. Eviction uses GPU-direct writes from GPU memory to SSD, and prefetch uses GPU-direct reads into a GPU-resident buffer followed by scatter to target GPU addresses. A batch coalescing layer groups small per-row transfers by SSD offset, compacts data into contiguous GPU regions, and issues fewer larger I/O operations to reduce per-call overhead. Runtime detection selects zero-copy or fallback transfer modes when GPU-direct storage is unavailable.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Pipelined Double-Buffer Eviction Scheduling with GPUDirect Storage for Zero-Copy Tiered Embedding Management", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10695

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Pipelined Double-Buffer Eviction Scheduling with GPUDirect Storage for Zero-Copy Tiered Embedding Management

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Pipelined Double-Buffer Eviction Scheduling with GPUDirect Storage for Zero-Copy Tiered Embedding Management

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information