Inventor(s)

Abstract

Techniques are described for speculative data prefetching in systems in which an LLM agent dynamically invokes retrieval backends whose candidates require embedding lookups from SSD-backed embedding tables. A user memory state with tiered memory data is obtained and converted into features. An intent prediction model produces, for each backend, a probability that the agent will invoke the backend. A prefetch controller computes a backend-specific prefetch decision using a value metric that combines probability, expected embedding lookups, SSD latency, and prefetch cost, and selects backends subject to a bandwidth budget based on SSD throughput and a Phase 1 time window. Speculative SSD reads are issued asynchronously during Phase 1 to populate cache prior to tool invocation in Phase 2. Mid-flight reasoning signals may update probabilities, enabling cancellation or late initiation of prefetch. Observed invocations may be logged for online refinement of the intent model.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS