Abstract

A Hybrid Offline-to-Real-Time Architecture generates context-aware ad copy while meeting the latency constraints of live ad auctions. In an offline phase, a large language model (LLM) analyzes a product catalog to pre-generate ad copy templates that contain semantic placeholders. During the real-time ad auction, a lightweight engine parses a user’s search query to extract contextual intent. The system then selects an appropriate template and injects the extracted intent into the placeholders to assemble the final ad copy. This two-pass approach decouples the computationally intensive LLM generation from the time-sensitive ad serving process, providing the semantic quality of LLM-generated text at low latency and reducing computational costs by performing LLM inference as an offline batch process.

Keywords: hybrid offline-to-real-time architecture, context-aware keyword insertion, generative artificial intelligence, large language model, template generation, query parsing, context-aware injection, ad serving infrastructure

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS