A technique is proposed for efficiently compressing presentation-dominated video files. Processing logic may receive, from a video conferencing platform, a video file including multiple frames and audio. The processing logic further may identify multiple presentation slides associated with the video file. In some instances, to identify the multiple presentation slides, the processing logic may receive the presentation slides from one of the video conferencing platform or a user of the video conferencing platform (e.g., via a client device). The processing logic may further generate a mapping of each of the multiple presentation slides to one or more of the multiple frames based on a similarity level between a given presentation slide and a given frame and organize the mapping in a data structure (frame-slide mapping data structure). This similarity level may be determined using one or more machine learning models. The processing logic may further compress the multiple presentation slides, the audio, and the frame-slide mapping data structure to obtain a compressed file. The video can be recreated/uncompressed by inserting the presentation slides into the video frames as determined by frame-slide mapping data structure and adding the audio (e.g., the audio channel) to the generated video. This results in less storage cost compared to conventional compression algorithms and better video quality for static contents (e.g. text, diagrams, sheets, etc.) of the recreated video.

This work is licensed under a Creative Commons Attribution 4.0 License.