Synchronization between the audio and video tracks in recording equipment is usually achieved using an audio-first approach. In this approach, the timestamp of a target video frame is compared to the timestamp of the sound emitted during that frame, timestamps being counted in units of video frames. Videos have a relatively low sampling rate, e.g., a 60 frame-per-sec video has frames separated by 16.67 milliseconds. Thus, the measurement of audio-video asynchrony is imprecise.
This disclosure describes video-first techniques for audio-video synchronization. A target video frame is captured, and its timestamp is mapped to the audio track. The audio track has millisecond-level time resolution due to high audio-sampling rates. Using the audio track, the timestamp of the sound (pulse) emitted during the target video frame is determined to millisecond accuracy. Timestamps of the target video frame and of the audio pulse are differenced to obtain a high-precision estimate of audio-video asynchrony.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Lee, Jason Chihhao; Kao, Peggy Pei Chi; Yu, Hung-Jen; Liang, Hung Ren; Huang, James Chen Chao; Chung, Jabez Hsu; Lee, Hao-Wei; Huang, Eric; and Lin, Lin Chi, "Precise Latency Calculation for Audio-Video Synchronization", Technical Disclosure Commons, (November 06, 2022)