Abstract
Plateau-aware temperature selection is described for knowledge distillation of mixed-vocabulary models that include natural-language tokens and structured identifiers (SIDs). Teacher logits for SIDs are obtained from a forward pass on calibration data and analyzed without training a student model. A cold-collapse temperature boundary is estimated as a minimum temperature at which perplexity of a temperature-scaled SID softmax exceeds a threshold, and a soft-collapse temperature boundary is estimated as a maximum temperature at which a discriminability measure based on probability ratios exceeds a threshold. A SID distillation temperature is selected from the plateau between the boundaries, including selecting a geometric mean of the boundaries, and used in a distillation objective that may apply separate temperatures for natural-language tokens and SIDs. A lambda-invariance diagnostic may indicate soft collapse when varying a SID loss weight does not change tail metrics.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Anonymous, "Temperature Selection for Knowledge Distillation in Mixed-Vocabulary Models", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10629