Abstract

Users enjoy creating slideshows, animated videos, etc. that capture their trips. Current techniques to prepare such creations are limited by the content of the captured photographs and user knowledge of such features. This disclosure describes the use of artificial intelligence to automatically generate a dynamic video journey from a collection of user photographs. With user permission, an artificial intelligence model performs semantic analysis of user photos to infer the user’s geographic path during the photo capture journey. A generative video model accesses street-level images along the path and synthesizes a base video that simulates first-person movement along the path. The user’s photos are synchronized in time and perspective with the base video and are transformed into short animations using a multimodal generative model. Context-aware outpainting from the boundaries of the generated animation is performed to blend the animation with the base video to generate a visual summary that enables the user to re-live and share their travel memories.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS