Abstract

AR/VR interactions rely on dedicated controllers or gestures, which can be cumbersome and limit user interaction with digital interfaces. This disclosure describes techniques for transforming everyday objects into interactive interfaces using augmented reality (AR), virtual reality (VR), or extended reality (XR) devices that perform real-time hand-object tracking. The techniques segment video frames into hands and objects using a deep learning based segmentation model. Objects are tracked in six degrees of freedom (6DoF) using a motion estimation network and machine-learning-based pose estimation. Simultaneously, 3D hand tracking identifies touch interactions, gestures, or voice commands. Cloud-hosted operating system (OS) interfaces are rendered onto physical objects in augmented reality, enabling natural and intuitive interactions such as touch, swipe, and multi-finger gestures. Examples include turning a book into a virtual tablet, a battery pack into a media controller, or a sticky note into a smart thermostat. By combining advanced segmentation, tracking, and gesture recognition, the techniques enable everyday objects to serve as seamless, dynamic controllers for software apps, media-casting devices, smartwatches, mobile devices, cloud-hosted applications, etc., reducing or eliminating the need for dedicated input devices.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS