Abstract

Owners and providers of artificial intelligence (AI) or machine learning (ML) models have an interest in tracking the datasets used in the training of their models for compliance and/or troubleshooting and in ensuring that high quality datasets are used for training the model. Dataset creators or owners have an interest in knowing if their dataset is being used to train an AI model. This disclosure describes blockchain techniques that capture and maintain the lineage and integrity (tamper-evidence) of a training dataset used for specific versions of AI/ML/generative-AI models. Given a training dataset, its hash is containerized along with a link to its location and its usage policy, and the container is uploaded to a blockchain ledger. The transparent, append-only blockchain record enables model owners to track their training datasets and to comply with usage policy, and enables content owners to detect content usage. The evolution of an AI model can be tracked against training datasets, enabling dataset optimization/de-biasing and reproducible AI-model behavior.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Porter, Nelly and Kolga, Rene, "Blockchain Ledgers for Tracking Datasets Used in Training AI Models", Technical Disclosure Commons, (February 12, 2024)
https://www.tdcommons.org/dpubs_series/6682

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Blockchain Ledgers for Tracking Datasets Used in Training AI Models

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Blockchain Ledgers for Tracking Datasets Used in Training AI Models

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information