Abstract

This disclosure presents a method for gesture-based virtual tool interaction on arbitrary surfaces within an Extended Reality (XR) environment. Utilizing a head-worn system with egocentric cameras, supplemented by wearable sensors, the system tracks three-dimensional (3D) hand poses and coordinates via a tracking system implementing a neural network or machine learning model. Unlike prior contact-detection methods, this approach employs a dual-network architecture: a static network classifies specific hand grasps (e.g., tripod or wide grips) to instantiate and spatially anchor corresponding virtual tools (e.g., pencils or erasers) with six-degrees-of-freedom (6DoF) precision; simultaneously, a secondary temporal neural network analyzes sequence-based micro-gestures for context-aware tool control, such as adjusting line width or color. By prioritizing gesture-driven tool selection over traditional menus, the system transforms any physical plane into a dynamic digital interface, reducing the need for specialized hardware or touch-sensitive surfaces.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS