Inventor(s)

Abstract

A multi-node training system uses tiered embedding storage with a high-bandwidth memory cache and local persistent storage on each node. Each node maintains a probabilistic cache directory, such as a Bloom filter, that encodes identifiers of cached embedding rows and periodically exchanges the directory with peer nodes, optionally using delta updates. Upon a local cache miss, a fetch decision engine consults received directories to identify candidate peers and selects between a one-sided RDMA read from a peer’s cache and a local persistent-store read based on estimated costs and load limits. Remotely retrieved embeddings are accepted subject to a staleness threshold evaluated using per-row version information, which may be tracked sparsely. Load balancing limits remote serving rates and provides fallback to local access on timeout, overload, or network unavailability.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS