Abstract

N-grams are a technique used in document processing to summarize the content of a document as a set of text fragments that it contains. N-grams are used for document processing across a wide range of applications such as indexing, clustering, and machine learning. This disclosure describes techniques to efficiently extract n-grams of a given length from a grammar, specified as a nondeterministic finite automaton (NFA) with ε-moves. The algorithm described here uses O(N) graph traversals to compute n-grams of length N from a grammar.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Boulgakov, Alexandre, "Efficient Extraction of n-grams From a Grammar", Technical Disclosure Commons, (October 29, 2020)
https://www.tdcommons.org/dpubs_series/3721

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Efficient Extraction of n-grams From a Grammar

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Efficient Extraction of n-grams From a Grammar

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information