An important objective of user-generated content platforms such as audio/video hosting or streaming platforms is to ensure that content that is available via their platforms is authorized for use, e.g., is provided by the true owner or with due permission of the true owner. To ensure that unauthorized content is not made available, such platforms match uploaded videos against a repository of reference (original) videos. To avoid video content being matched, content uploaders utilize constantly evolving new content transformation strategies when uploading unauthorized content. This disclosure describes automated techniques that help speed up and scale the collection of training examples of recent techniques of content transformations designed to bypass match detection procedures. These include synthetic generation (automatically generating content examples similar to match avoiding content) and scaled up mining and filtering (which includes performing searches for other content that is similar to match avoiding content on some dimension and filtering such content using high performance matching algorithms) to detect other examples of similar match avoiding content. The corpus of data generated by the described techniques can be used to train and validate a new version of matching procedures that is robust to the recent match avoidance attempts.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.