Abstract
Programmatic advertising systems may have difficulty detecting natively embedded advertisements, such as host-read sponsorships in video or audio content, creating a risk of serving conflicting ads and hindering performance measurement. A described technique utilizes an automated server-side pipeline that can generate a timestamped transcript of media content. The system can identify potential ad segments by scanning the transcript for commercial keywords and may then use a large language model, guided by a specific prompt, to analyze these segments and extract a canonical name of the sponsored brand. This process can produce structured data, including brand names and timestamps, which can enable ad servers to apply competitive exclusion rules and may facilitate the calculation of performance metrics for certain embedded ad inventory.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Sable, Sanket and King, Kevin, "Detecting Embedded Advertisements in Media via Transcript Analysis and a Large Language Model", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/9785