Large language models (LLMs) and other machine learning models are trained on large amounts of curated text data, including both public and private datasets of variable quality. Data collection, cleaning, deduplication, and filtering are performed to build appropriate training datasets. However, such operations cannot protect the trained model against data poisoning (i.e., the intentional corruption of training data) that attempts to manipulate or compromise the behavior of the model. This disclosure describes techniques to improve data security and integrity of the training dataset for LLMs via data validation of a subset (or all) of the data points within the dataset available for training. A data validation policy configuration (specified by the entity that is training and/ or tuning the model) is used to determine a level of confidence of correctness of the data by validating it against different sources. Data that is flagged during validation can be marked/ labeled as less reliable or can be excluded during model training. Model responses can include metadata that indicates a data confidence score for each data point in the response.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.