Abstract

Indoor navigation presents a significant challenge because Global Positioning System (GPS) accessibility is limited in interior spaces and static environmental scans often fail to generalize as environments change. To address these limitations, a novel methodology utilizes generative models to produce diverse map layouts and vision-language models (VLMs) to interpret potentially walkable areas within those layouts. By scaling this approach to generate a large dataset, significant improvements are achieved in the wayfinding capabilities of the models. This technology is particularly applicable to navigational agents and wearable devices, such as glasses, that require real-time semantic understanding of human-readable graphical maps. The resulting system bridges the gap between raw graphical data and practical indoor navigation, offering a scalable solution for assistive technology in dynamic environments.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS