Abstract

A collection of code bases that include source-destination pairs of code, translated from a source environment to a destination environment can be highly valuable for training artificial intelligence (AI) or machine learning (ML) models. However, it is possible that a code base includes private or sensitive information such as variable names specific to a particular party, which makes it infeasible for such use. This disclosure describes techniques to automatically remove sensitive information from code to make the code amenable for use as training data for machine learning (ML) or artificial intelligence (AI) models. Source-destination pairs of translated code are transformed into their corresponding abstract syntax trees (AST). The ASTs are anonymized such that they hold syntactic representations of the code while excising semantic information. The AASTs of source-destination code pairs can serve as a safe, shared corpus of data that can be leveraged to train AI/ML models.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

n/a, "Transforming Code to Anonymized Abstract Syntax Trees for AI/ML Model Training", Technical Disclosure Commons, (March 08, 2024)
https://www.tdcommons.org/dpubs_series/6767

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Transforming Code to Anonymized Abstract Syntax Trees for AI/ML Model Training

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Transforming Code to Anonymized Abstract Syntax Trees for AI/ML Model Training

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information