Defensive Publications Series

Real-Time Whisper-to-Voiced Speech Conversion using Generative Spectral Mapping

Abstract

Whispered speech presents challenges in voice communication because it lacks the vocal cord vibration, fundamental frequency, and harmonic structure found in voiced speech. This results in reduced intelligibility and an unnatural sound during transmission. To address these limitations, a generative spectral mapping method is disclosed. The method utilizes a deep neural network to map the formant structure of whispered audio to a reconstructed harmonic structure. Missing pitch information is inferred from intensity dynamics and semantic context, while speaker identity is maintained through conditioning on a speaker embedding vector. A frame-based generative vocoder processes audio in small segments to allow for real-time conversion. This technology enables the transformation of whispered input into fully voiced speech, improving privacy and clarity in shared environments without requiring specialized hardware sensors.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Yakar, Tamar and Labzovsky, Ilia, "Real-Time Whisper-to-Voiced Speech Conversion using Generative Spectral Mapping", Technical Disclosure Commons, (March 24, 2026)
https://www.tdcommons.org/dpubs_series/9590

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Real-Time Whisper-to-Voiced Speech Conversion using Generative Spectral Mapping

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Real-Time Whisper-to-Voiced Speech Conversion using Generative Spectral Mapping

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information