Machine learning (ML) models trained for various purposes are generally kept confidential, e.g., due to their commercial value, proprietary nature of training data, etc. Therefore, commercial cloud-based machine-learning service providers protect their ML models even as they provide one or more services to customers that employ ML models. For example, a service enables a customer to upload an observation, e.g., an image, and receive a label for the observation, generated by a ML model that’s trained to determine labels for images. Recent research has shown that given a sufficient number of observations and returned labels, it is possible to reverse engineer the ML model that generated the labels. This disclosure presents techniques that thwart reverse-engineering efforts, e.g., by adversarial actors, by returning, for a small fraction of input queries, not a true but a near-true class label.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Eidem, Laura and Jacobson, Alex, "Preventing reverse engineering of black-box classifiers", Technical Disclosure Commons, (November 08, 2018)