Abstract

Disclosed is a system for generating "Golden Master" training datasets for Multimodal Vision-Language-Action (VLA) models. The core innovation is a "Deterministic Render-Capture Loop" that decouples simulation time (simTime) from wall-clock time. The system advances the simulation by a fixed delta ($\Delta t$), renders the state, and captures the frame using WebCodecs, timestamping it with simTime regardless of actual render duration. This guarantees a mathematically perfect framerate and perfect synchronization between video frames and telemetry logs, eliminating the frame drops and jitter inherent in real-time screen recording. The pipeline is orchestrated within the TrueOS/Electron shell, managing the simulation backend (e.g., Python physics simulator) as a native child process with high-throughput IPC.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS