Ji Chen
Yu Chen


This paper describes a system employing supervised machine learning algorithms to automatically detect optimal ad insertion points in an item of video content. The system can learn from data obtained from manual ad insertion. Manual ad insertion examples can be obtained from broadcast television programming containing ads, or from manually applied labels obtained through crowd sourcing. The system can process the raw audio and video signals to obtain a set of features that best describe points of manual video ad insertion, and feed them into a machine learning model. The machine learning model can include a deep neural network or other supervised classification algorithms. This system can learn from human inputs, and produce results that best fit in human perception of optimal ad insertion points. The system can thus improve user experience on video ads, drive higher ad revenue for content publishers, and reduce the burden on content providers.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.