Abstract

This article describes an AI-based enhancement for audio transcription that utilizes video data to improve accuracy and context during conference calls. By analyzing participant gestures, facial expressions, and non-verbal actions, the technology aims to create a more engaging communication experience for both in-person and remote participants with an enhanced audio transcription that captures deeper levels of communication and a detailed collaboration context in a meeting.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.

Recommended Citation

INC, HP, "Vision AI-based Audio Transcription Enhancement", Technical Disclosure Commons, (August 26, 2025)
https://www.tdcommons.org/dpubs_series/8505

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Vision AI-based Audio Transcription Enhancement

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Vision AI-based Audio Transcription Enhancement

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information