Abstract

This paper describes a technique for jointly training multiple neural network models to handle complex and diverse data distributions. The technique breaks up data into separate classes, with each model focusing on a specific subset while using a classifier-based architecture with temperature-controlled Softmax. The training process gradually transitions from uniform model contribution to specialized model selection, combining predictions through weighted summation. This enables effective distribution of different data patterns across multiple smaller neural networks while maintaining inference efficiency through single model selection. The technique is particularly valuable in scenarios where data exhibits significant variations that would otherwise require extremely large single models.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS