D ShinFollow


Conventional indoor localization techniques rely on high-precision indoor 3/6 degrees-of-freedom (DOF) positioning of the user device which may be infeasible if the device lacks positioning sensors such as GPS or IMU, if such sensors are turned off, or if the sensors have insufficient accuracy. This disclosure describes techniques the use of language modeling techniques for providing indoor navigation capabilities in the absence of such sensor data based on the local visual context obtained with a camera. Text captions describing frames of the user’s visual context in an indoor space are generated. A collection of captions for the current and recently captured, timestamped frames of the visual context, and a suitable prompt and metadata are input to a large language model to determine the current location of the user within the indoor space. The techniques can be incorporated within any indoor digital mapping and navigation application via any device capable of capturing the visual context via a camera.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.