Systems and methods for determining breakpoints in media content via a machine learning architecture by accounting for the context of the media content, and for placing separate pieces of additional content into the breakpoints, are described. The machine learning architecture determines the number of pieces of additional content to insert into the media content, divides the media content into a number of scenes and tags each scene with appropriate identifiers based on comparison to reference content, identifies break points at the end of each scene, identifies which break point to insert pieces of additional content into, and then inserts the additional content. The machine learning architecture places a break point at the end of each scene and determines which break point to insert additional content into by determining the content and context of the scene or scenes before and/or after the break point.

