Defensive Publications Series

Enhancing Model Precision with Quality-Aware Label Selection

Abstract

A novel methodology is presented for enhancing the precision of machine learning models utilized in high-stakes domains, specifically systems for defense against malicious advertising activities (AdSpam defense). This approach addresses the substantial challenges posed by noisy and incomplete positive training labels common in real-world datasets. By systematically refining the positive label set, the methodology focuses on identifying and retaining only the most reliable and consistently represented spam patterns. This technique integrates model-based explainability, specifically SHapley Additive exPlanations (SHAP), with feature-based clustering to construct a high-quality training dataset. This process effectively mitigates the negative effects of ambiguous labels, unknown negative examples, and sparsely represented, long-tail spam patterns. Initial experiments demonstrate the potential for substantial improvement in model performance, exemplified by an AUC-ROC increase from 0.906 to 0.998.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Liu, Fei and Zhao, Manqi, "Enhancing Model Precision with Quality-Aware Label Selection", Technical Disclosure Commons, (March 19, 2026)
https://www.tdcommons.org/dpubs_series/9567

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Enhancing Model Precision with Quality-Aware Label Selection

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Enhancing Model Precision with Quality-Aware Label Selection

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information