Abstract

Three-dimensional (3D) virtual environments synthesized from two-dimensional images currently lack an audiovisual accompaniment appropriate to the scenes in the images, thereby detracting from the immersive experience. This disclosure describes techniques to generate a spatial audio track appropriate to a 3D virtual scene and to insert the audio track into the 3D visual scene. Two-dimensional images of a given location are analyzed by a visual language model to obtain a text description of the location. The text description is provided as input to a music language model to generate audio that is appropriate to the text description (and hence to the underlying image set). The generated audio is subjected to spatial encoding, source separation, and/or remixing to spatially associate sounds with the locations of corresponding sources in the virtual space. The techniques can provide an immersive audiovisual experience within augmented/extended/virtual reality environments.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Shin, D and Nongpiur, Rajeev, "Artificial Intelligence Generated Spatial Audio for Immersive Virtual Reality Scenes", Technical Disclosure Commons, (November 05, 2024)
https://www.tdcommons.org/dpubs_series/7499

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Artificial Intelligence Generated Spatial Audio for Immersive Virtual Reality Scenes

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Artificial Intelligence Generated Spatial Audio for Immersive Virtual Reality Scenes

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information