Defensive Publications Series

Using Images from a Simulated 3D Environment for LLM Fine-tuning for Image and Video Understanding Tasks

Abstract

Large language models (LLMs) are capable of answering questions with reference to images or videos. However, LLMs sometimes get spatial relationships wrong. A challenge for training LLMs for 3D scene understanding is the lack of suitably large image and video datasets. This disclosure describes the use of a simulated three-dimensional (3D) environment to obtain two-dimensional (2D) images or videos of the environment from different perspectives. The point is moved in the simulated 3D space and a stream of 2D images is obtained from the different positions. The simulated 3D environment can be used to model various scenarios where the LLM has relatively lower accuracy. Using the gathered images and ground truth from the simulation, supervised fine-tuning (SFT) of the LLM can be performed. Since the data gathering is automated, large amounts of data can be collected at low cost. Further, the simulated 3D environment can be structured to mimic real-world use cases where an LLM is used to determine answers to spatial questions, e.g., in image/video understanding tasks.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Weisz, Ágoston, "Using Images from a Simulated 3D Environment for LLM Fine-tuning for Image and Video Understanding Tasks", Technical Disclosure Commons, (July 17, 2025)
https://www.tdcommons.org/dpubs_series/8374

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Using Images from a Simulated 3D Environment for LLM Fine-tuning for Image and Video Understanding Tasks

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Using Images from a Simulated 3D Environment for LLM Fine-tuning for Image and Video Understanding Tasks

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information