Abstract

When doing analysis of source code archives from an unknown origin it can be helpful to find out where the code originated from geographically. Comments in these files can be helpful, as they are quite often written in the native natural language of the developer. Finding out which language the file is in can help understanding the flow of the code (example: translating comments) and provenance. By analyzing the contents of a file and seeing which character sets the contents belong to a better guess can be made.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Hemel, Armijn, "Recognizing a natural language or language class in source code files", Technical Disclosure Commons, (January 22, 2019)
https://www.tdcommons.org/dpubs_series/1898

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Recognizing a natural language or language class in source code files

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Recognizing a natural language or language class in source code files

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information