Abstract

A unified lifecycle system supports dual-vocabulary LLM-enhanced recommendations using natural language tokens and semantic identifier (SID) tokens. Cross-architecture migration preserves accumulated SID knowledge by learning an embedding projection that minimizes pairwise similarity distortion and enforces neighborhood contraction for K-nearest SID neighbors, followed by staged initialization, embedding warmup, continued pretraining with synthetic domain replay, and validation. Modality-aware distillation applies dual temperatures, using higher temperature for natural language and lower temperature for SIDs to avoid catastrophic SID substitutions, with long-tail SID importance weighting, SID embedding-alignment, and a closed-loop controller that adjusts temperatures and loss weights when SID quality degrades. Serving uses staleness-tiered amortization: offline item SID embeddings, near-line user embeddings, O(1) real-time adaptation, and an online lightweight fusion network. A cross-tier consistency protocol decays real-time state upon near-line refresh to prevent double-counting, enabling low latency and reduced memory.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Dual-Vocabulary LLM Recommendations", Technical Disclosure Commons, (June 29, 2026)
https://www.tdcommons.org/dpubs_series/10645

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Dual-Vocabulary LLM Recommendations

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Dual-Vocabulary LLM Recommendations

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information