Abstract

Large language models (LLMs) and other machine learning models are trained on large amounts of curated text data, including both public and private datasets of variable quality. Data collection, cleaning, deduplication, and filtering are performed to build appropriate training datasets. However, such operations cannot protect the trained model against data poisoning (i.e., the intentional corruption of training data) that attempts to manipulate or compromise the behavior of the model. This disclosure describes techniques to improve data security and integrity of the training dataset for LLMs via data validation of a subset (or all) of the data points within the dataset available for training. A data validation policy configuration (specified by the entity that is training and/ or tuning the model) is used to determine a level of confidence of correctness of the data by validating it against different sources. Data that is flagged during validation can be marked/ labeled as less reliable or can be excluded during model training. Model responses can include metadata that indicates a data confidence score for each data point in the response.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Namer, Assaf; Vagts, Hauke; Miller, Jim; Maltzman, Brandon; and Rinkevich, Guy, "Training Dataset Validation to Protect Machine Learning Models from Data Poisoning", Technical Disclosure Commons, (January 02, 2024)
https://www.tdcommons.org/dpubs_series/6550

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Training Dataset Validation to Protect Machine Learning Models from Data Poisoning

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Training Dataset Validation to Protect Machine Learning Models from Data Poisoning

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information