Many 360° videos captured using a camera rig and tripod setup have artifacts related to the tripod that the rig is attached to. This disclosure describes automatic removal of such artifacts by use of machine learning techniques. Portions of the video that include artifacts are detected and removed automatically using trained machine learning model(s). The ML models are trained to recognize production equipment based on videos captured in neutral environments or based on markers on the production equipment. The detected artifacts are automatically replaced based on the context and ensuring that the resultant video is stable.

