Defensive Publications Series

Iterative Pretraining of Multi-Modal Models Using Strong Input Masking and Pseudo-Labels

Abstract

The cost of creating large, accurately labeled datasets can challenge the pretraining of large-scale multi-modal models, sometimes leading to the use of large-scale data with noisy, machine-generated pseudo-labels. Some pretraining techniques may not effectively use the weak supervisory signal from these imperfect labels for certain downstream tasks. A system is described for iteratively pretraining a model using strong input masking. In this approach, a teacher model can generate pseudo-labels for a large dataset. A student model can then be trained to predict these labels using heavily masked inputs, for example, images with occluded patches and text with missing words. This process can be repeated, with the student model becoming the teacher for a subsequent iteration. The technique may improve a model’s robustness to label noise and can be used to produce a shared model backbone for multiple tasks while potentially reducing reliance on large-scale, human-verified datasets.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Reisswig, Christian; Rauschmayr, Nathalie; Xian, Yongqin; and Tonioni, Alessio, "Iterative Pretraining of Multi-Modal Models Using Strong Input Masking and Pseudo-Labels", Technical Disclosure Commons, (April 10, 2026)
https://www.tdcommons.org/dpubs_series/9775

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Iterative Pretraining of Multi-Modal Models Using Strong Input Masking and Pseudo-Labels

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Iterative Pretraining of Multi-Modal Models Using Strong Input Masking and Pseudo-Labels

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information