Abstract

This disclosure describes techniques to enhance multi-channel speech denoising using voice activity detection (VAD). Per the techniques, VAD is leveraged to detect segments of microphone signals where the user’s speech is absent but where background noises are present. For such segments, the microphone output to the headset is set to silent. Effectively, VAD is used to gate the microphone output based on the presence or absence of the user’s speech. The gating action of the VAD results in a denoising, since noise is completely suppressed at least in segments where the user’s speech is absent. Denoising using VAD is described for single-channel implementation; multi-channel implementation (where spatial cues are used determine VAD-based gating); and residual noise-based VAD (where user speech captured from a reference microphone is used as groundtruth). By gating the denoiser output with a VAD model, a lighter denoiser model can be used while still suppressing leaked noise through VAD gating.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS