Abstract
Systems and methods are described for caching large language model (LLM) reasoning outputs in recommendation services using semantic profile versioning. Cache entries map a key including user identifier, user profile version, and candidate identifier to a reasoning output that includes a score, tier, and rationale. A profile version increments in one embodiment when a profile update delta is classified as a semantic change, including preference additions/removals or profile regeneration, and optionally confidence updates that cross configured thresholds, while session and timestamp updates are treated as non-semantic. A two-level cache may be used with a session cache and a cross-session cache. Upon a semantic change, partial invalidation is performed by identifying affected candidates using topic affinity; affected entries are invalidated while unaffected entries are migrated to the new profile version to preserve reuse across sessions.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Anonymous, "Semantic Profile Version-Based Cache Invalidation for Machine Learning Inference Systems", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10718