Abstract

The adoption of extended reality (XR) headsets is limited by a lack of native volumetric content, resulting in flat presentations of standard two-dimensional media libraries. Traditional depth-mapping approaches introduce severe visual stretching artifacts during six-degrees-of-freedom (i.e., translation and orientation) movement due to missing data behind foreground objects. The disclosed technology introduces a generative artificial intelligence (AI) pipeline that synthesizes volumetric media from standard monocular two-dimensional inputs (e.g., an image or a video frame). The pipeline performs depth estimation from the two-dimensional input to construct a three-dimensional point cloud, computes left and right eye stereoscopic perspectives via head-pose tracking, and uses a generative diffusion model to dynamically determine missing background textures and lighting in occluded regions. These generated elements are compiled using three-dimensional Gaussian Splatting (or another rendering mechanism, such as meshes), enabling users to view media with spatial parallax without visual tearing or the need for specialized capture hardware.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS