Abstract

In traditional video conferencing with screensharing, presenters struggle to reference specific on-screen elements effectively due to the limitations of small video feeds and basic pointer tools. This disclosure describes video conferencing (VC) techniques for enhancing the ability to point (deixis) in a videoconference using synthesized hand gestures. The video feed of the presenter is integrated into the shared screen, while synthesized hand gestures are overlaid to align with speech and pointer movements. With user permission, a multimodal machine learning pipeline accepts as input shared screen content, pointer/controller data, and ongoing speech to generate gestures and to determine optimal video placement. Modules for user interface understanding, gesture synthesis, and spatial placement optimization ensure that gestures align contextually with ongoing speech and VC content.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS