Abstract

Evaluating the consistency of human ratings is important for developing and improving machine learning algorithms. For example, such algorithms can be used to identify trends in online video content, including short-form videos. Human-provided ratings are used to calculate metrics that are useful for reporting and algorithm development. However, the human rating process is complex, involving factors such as instruction development, onboarding and training of human raters across different geographies/markets and languages, and obtaining ratings for a diverse set of video content. In all obtained ratings, there is always potential for human error. This disclosure describes techniques to obtain consistency measurements for ratings provided by human raters. Per the techniques, content to be rated (e.g., videos, video clusters, etc.) is provided in a first batch and ratings are obtained. The content of the first batch is resampled, anonymized, and reshuffled, and provided as a second batch, to the same or different raters. A surrogate ID map is maintained. Metrics from the two batches are collated and analyzed using the surrogate ID map to identify inconsistencies.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS