Abstract
A Hybrid Offline-to-Real-Time Architecture generates context-aware ad copy while meeting the latency constraints of live ad auctions. In an offline phase, a large language model (LLM) analyzes a product catalog to pre-generate ad copy templates that contain semantic placeholders. During the real-time ad auction, a lightweight engine parses a user’s search query to extract contextual intent. The system then selects an appropriate template and injects the extracted intent into the placeholders to assemble the final ad copy. This two-pass approach decouples the computationally intensive LLM generation from the time-sensitive ad serving process, providing the semantic quality of LLM-generated text at low latency and reducing computational costs by performing LLM inference as an offline batch process.
Keywords: hybrid offline-to-real-time architecture, context-aware keyword insertion, generative artificial intelligence, large language model, template generation, query parsing, context-aware injection, ad serving infrastructure
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Yao, Karen and Ramamurthi, Indu, "Hybrid Offline-to-Real-Time Architecture for Context-Aware Keyword Insertion", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10521