Defensive Publications Series

Generating Navigable Ground-Level Views from Overhead Imagery and Multi-Modal Inputs

Abstract

Current digital maps and simulation tools often rely on separate, fixed images, which makes smooth movement difficult and can limit realism and how big the map can be. This document introduces a system that uses multiple types of input to create continuous, navigable, ground-level videos from overhead map data. The system works by generating each new video frame based on the previous one, using inputs like satellite images, user movement commands, and text instructions (like "make it snowy"). It uses a flow-matching architecture, stabilized by techniques like noise injection. This method allows for smooth, user-controlled navigation and real-time changing the visual environment, providing a way to generate dynamic, explorable virtual worlds from geographic data.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Kligvasser, Idan; Intrator, Yotam; Rivlin, Ehud; Leifman, George; and Cohen, Regev, "Generating Navigable Ground-Level Views from Overhead Imagery and Multi-Modal Inputs", Technical Disclosure Commons, (March 24, 2026)
https://www.tdcommons.org/dpubs_series/9607

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Generating Navigable Ground-Level Views from Overhead Imagery and Multi-Modal Inputs

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Generating Navigable Ground-Level Views from Overhead Imagery and Multi-Modal Inputs

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information