Abstract

Techniques are described for SSD-backed embedding tables using a compression-aware, asymmetric-precision pipeline. Embedding rows are stored on SSD in reduced-precision integer form (e.g., INT8 or INT4) together with per-row affine quantization parameters including a scale and a zero-point. Requested rows are transferred in compressed form into a GPU-accessible staging buffer, and a GPU kernel dequantizes the rows into a full-precision representation (e.g., FP16/FP32) within a high-bandwidth-memory cache. A double-buffer schedule overlaps SSD reads with GPU dequantization across iterations. When cached embeddings are modified and evicted, a GPU kernel re-quantizes the updated full-precision values, updates per-row quantization parameters, and writes the reduced-precision rows back to SSD. A codebook manager may track distribution drift and adapt quantization policies over time.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Compression-Aware Asymmetric-Precision Pipeline for SSD-Backed Embedding Tables", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10660

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Compression-Aware Asymmetric-Precision Pipeline for SSD-Backed Embedding Tables

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Compression-Aware Asymmetric-Precision Pipeline for SSD-Backed Embedding Tables

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information