Abstract

Movement trajectories derived from timestamped location data stored locally on a user device can be a rich information source, especially for indoor spaces. However, these cannot be used on other devices such as mobile robots. Moreover, mobile robots cannot dynamically learn from user movements within the same space. This disclosure describes techniques that leverage Visual Language Models (VLMs) as a backbone to enable a device such as a mobile robot to receive historical movement trajectories of another device within the same space. The VLM backbone is utilized to transform the trajectories into lower-level actuation based on higher-level semantic interpretations related to the physical space. Users can issue semantic commands to operate the devices that receive the location trajectories to perform tasks within the physical space. The techniques enable users to easily control the operation of hardware agents in a physical space and interact with such agents using intuitive semantic commands.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS