Abstract

Developers use interactive development environments (IDEs) to create and share documents that contain live code, equations, visualizations, narrative text, etc. as part of the artificial intelligence/ machine learning (AI/ML) development process. Virtual machines that run IDEs may have access to private and/or sensitive data used during model training or use. For data security and compliance, it is necessary to highlight and track the VMs that have been in contact with sensitive information. This disclosure describes techniques to automatically identify and label the presence of sensitive data in virtual machines and disks as part of machine learning. Custom VM images are provided that include data scanning scripts that can identify the presence of sensitive data during or after usage, e.g., by a developer using an IDE. The scripts can automatically log the presence of data and generate alerts. Users of such virtual machines are provided additional controls to perform the training process in a secure and confidential manner in compliance with applicable data regulations.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS