Defensive Publications Series

Hybrid Offline-to-Real-Time Architecture for Context-Aware Keyword Insertion

Abstract

A Hybrid Offline-to-Real-Time Architecture generates context-aware ad copy while meeting the latency constraints of live ad auctions. In an offline phase, a large language model (LLM) analyzes a product catalog to pre-generate ad copy templates that contain semantic placeholders. During the real-time ad auction, a lightweight engine parses a user’s search query to extract contextual intent. The system then selects an appropriate template and injects the extracted intent into the placeholders to assemble the final ad copy. This two-pass approach decouples the computationally intensive LLM generation from the time-sensitive ad serving process, providing the semantic quality of LLM-generated text at low latency and reducing computational costs by performing LLM inference as an offline batch process.

Keywords: hybrid offline-to-real-time architecture, context-aware keyword insertion, generative artificial intelligence, large language model, template generation, query parsing, context-aware injection, ad serving infrastructure

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Yao, Karen and Ramamurthi, Indu, "Hybrid Offline-to-Real-Time Architecture for Context-Aware Keyword Insertion", Technical Disclosure Commons, (June 22, 2026)
https://www.tdcommons.org/dpubs_series/10521

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Hybrid Offline-to-Real-Time Architecture for Context-Aware Keyword Insertion

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Hybrid Offline-to-Real-Time Architecture for Context-Aware Keyword Insertion

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information