This disclosure describes techniques for highlighting code snippets included in text documents edited via a word processor. A text block containing code is received and analyzed using a tokenizer to identify specific words included in the text block. The words are classified by the tokenizer into a finite set of types (categories) by matching the words with a list of words defined for different computer languages. Words or characters are colorized based on whether the word is a language specific reserved keyword or a user-defined identifier. Multiple coding languages can be supported, with low maintenance, since only the active dictionary of reserved words needs to be updated when adding a language. The techniques can support live updates, highlighting code even as the user enters text. Incremental highlighting can be implemented with relatively minimal additional effort by analyzing only a small block of code near the altered text character(s).
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Ben Noon, Barak; Galante, Gregory George; Hariri, Behnoosh; Aberbach, Tomer; Kaplan, Blake; and Cahill, Emily, "Language Agnostic Code Highlighting in Word Processors", Technical Disclosure Commons, (June 16, 2022)