Abstract

Disclosed is a system for generating "Golden Master" training datasets for Multimodal Vision-Language-Action (VLA) models. The core innovation is a "Deterministic Render-Capture Loop" that decouples simulation time (simTime) from wall-clock time. The system advances the simulation by a fixed delta ($\Delta t$), renders the state, and captures the frame using WebCodecs, timestamping it with simTime regardless of actual render duration. This guarantees a mathematically perfect framerate and perfect synchronization between video frames and telemetry logs, eliminating the frame drops and jitter inherent in real-time screen recording. The pipeline is orchestrated within the TrueOS/Electron shell, managing the simulation backend (e.g., Python physics simulator) as a native child process with high-throughput IPC.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Judson, Matthew D., "Deterministic Data Synthesis Pipeline for Multimodal AI Training (The TrueOS-KTKN Data Engine)", Technical Disclosure Commons, (December 02, 2025)
https://www.tdcommons.org/dpubs_series/8958

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Deterministic Data Synthesis Pipeline for Multimodal AI Training (The TrueOS-KTKN Data Engine)

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Deterministic Data Synthesis Pipeline for Multimodal AI Training (The TrueOS-KTKN Data Engine)

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information