Abstract

Current digital maps and simulation tools often rely on separate, fixed images, which makes smooth movement difficult and can limit realism and how big the map can be. This document introduces a system that uses multiple types of input to create continuous, navigable, ground-level videos from overhead map data. The system works by generating each new video frame based on the previous one, using inputs like satellite images, user movement commands, and text instructions (like "make it snowy"). It uses a flow-matching architecture, stabilized by techniques like noise injection. This method allows for smooth, user-controlled navigation and real-time changing the visual environment, providing a way to generate dynamic, explorable virtual worlds from geographic data.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS