When doing analysis of source code archives from an unknown origin it can be helpful to find out where the code originated from geographically. Comments in these files can be helpful, as they are quite often written in the native natural language of the developer. Finding out which language the file is in can help understanding the flow of the code (example: translating comments) and provenance. By analyzing the contents of a file and seeing which character sets the contents belong to a better guess can be made.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Hemel, Armijn, "Recognizing a natural language or language class in source code files", Technical Disclosure Commons, (January 22, 2019)