Owners and providers of artificial intelligence (AI) or machine learning (ML) models have an interest in tracking the datasets used in the training of their models for compliance and/or troubleshooting and in ensuring that high quality datasets are used for training the model. Dataset creators or owners have an interest in knowing if their dataset is being used to train an AI model. This disclosure describes blockchain techniques that capture and maintain the lineage and integrity (tamper-evidence) of a training dataset used for specific versions of AI/ML/generative-AI models. Given a training dataset, its hash is containerized along with a link to its location and its usage policy, and the container is uploaded to a blockchain ledger. The transparent, append-only blockchain record enables model owners to track their training datasets and to comply with usage policy, and enables content owners to detect content usage. The evolution of an AI model can be tracked against training datasets, enabling dataset optimization/de-biasing and reproducible AI-model behavior.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.