Abstract
Indoor navigation presents a significant challenge because Global Positioning System (GPS) accessibility is limited in interior spaces and static environmental scans often fail to generalize as environments change. To address these limitations, a novel methodology utilizes generative models to produce diverse map layouts and vision-language models (VLMs) to interpret potentially walkable areas within those layouts. By scaling this approach to generate a large dataset, significant improvements are achieved in the wayfinding capabilities of the models. This technology is particularly applicable to navigational agents and wearable devices, such as glasses, that require real-time semantic understanding of human-readable graphical maps. The resulting system bridges the gap between raw graphical data and practical indoor navigation, offering a scalable solution for assistive technology in dynamic environments.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
MBHB, MBHB; Goyal, Mohit; Panagopoulou, Artemis; Purohit, Aveek; Kulshrestha, Ace; and Yazdani, Soroosh, "Semantic Navigation Using Generative Models and Graphical Maps", Technical Disclosure Commons, (February 10, 2026)
https://www.tdcommons.org/dpubs_series/9308
COinS