The intelligibility of speech within media content, e.g., audio or video streams, is an important factor that determines the reach and popularity of the media. Objective measures of audio and speech quality, e.g., PESQ and SII scores, correlate poorly with human assessment. MOS, a widely accepted intelligibility test, is subjective, expensive, and time consuming.
Techniques disclosed herein provide an objective measure of the intelligibility of speech within video or audio content. Speech intelligibility scores are calculated based on the edit distance between human speech transcriptions of short clips and transcripts produced by an automatic speech recognizer. The speech intelligibility score is based on human rating and retains objectivity.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Chua, Edrei; Fedor, Jason; Collins, Caile; and Malenfant, Aaron, "Quantifying speech intelligibility based on crowdsourcing", Technical Disclosure Commons, (December 01, 2017)