Abstract

Analyzing large-scale media datasets for purposes such as training artificial intelligence models may present inefficiencies when using separate, post-facto analysis pipelines that can involve redundant data processing. A framework may extract scene-level signals from media content concurrently with media transcoding operations. By leveraging a common decoding step for both transcoding and analysis, a system can partition media into scenes and compute quantitative signals related to, for example, visual quality, motion dynamics, and production attributes. The extracted signals can be stored in a structured, indexed database, which can create a queryable dataset from a media archive. This approach may provide a scalable and computationally efficient method for curating datasets used to pre-filter media, condition generative models, and evaluate model performance across specific content characteristics.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS