Abstract

The intelligibility of speech within media content, e.g., audio or video streams, is an important factor that determines the reach and popularity of the media. Objective measures of audio and speech quality, e.g., PESQ and SII scores, correlate poorly with human assessment. MOS, a widely accepted intelligibility test, is subjective, expensive, and time consuming.

Techniques disclosed herein provide an objective measure of the intelligibility of speech within video or audio content. Speech intelligibility scores are calculated based on the edit distance between human speech transcriptions of short clips and transcripts produced by an automatic speech recognizer. The speech intelligibility score is based on human rating and retains objectivity.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Chua, Edrei; Fedor, Jason; Collins, Caile; and Malenfant, Aaron, "Quantifying speech intelligibility based on crowdsourcing", Technical Disclosure Commons, (December 01, 2017)
https://www.tdcommons.org/dpubs_series/843

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Quantifying speech intelligibility based on crowdsourcing

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Quantifying speech intelligibility based on crowdsourcing

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information