Abstract

In a joint optimization model, information from a large, complex teacher model is transported to small, light student models using knowledge distillation. Dynamic knowledge distillation allows the student models to learn from the teacher model on the fly. However, the performance of a joint optimization model that uses dynamic knowledge distillation suffers if the teacher model contains too much noise from the negative labels, or does not have enough information from the negative labels. This disclosure describes techniques to implement dynamic knowledge distillation by using temperature to control the amount of information transmission about negative labels from a teacher model to a student model in a joint optimization model. Greater amount of information about the negative labels can be transmitted by setting the temperature high, while noise from the negative labels in the teacher model can be suppressed by setting the temperature low.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Use of Temperature in Dynamic Knowledge Distillation for Joint Optimization Model", Technical Disclosure Commons, (November 18, 2020)
https://www.tdcommons.org/dpubs_series/3783

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Use of Temperature in Dynamic Knowledge Distillation for Joint Optimization Model

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Use of Temperature in Dynamic Knowledge Distillation for Joint Optimization Model

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information