Abstract
Machine learning models may face challenges in counting distinct objects in videos, particularly when presented with variations in frame rates or object speeds. Disclosed systems and techniques can address these challenges with a data augmentation methodology for generating training data. For example, an approach can involve creating multiple training examples from a single source video by sampling its frames at various temporal frequencies. A dynamic labeling protocol may be used to assign an object count label to a new sequence that reflects the number of objects discernible in that specific, potentially sparser, sequence. Training a model on these varied representations can improve its ability to perform object counting with improved robustness to variations in the frame rate of an input video and the apparent velocity of objects within it.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Rendulic, Ivor and Pavetić, Filip, "Multi-Frequency Temporal Sampling for Training Video Object Counting Models", Technical Disclosure Commons, (April 01, 2026)
https://www.tdcommons.org/dpubs_series/9682