This disclosure describes computational linguistics techniques for software input patterns and test coverage. Structured input data which can have arbitrary and evolving schema, obtained from production software and from testbeds, are tokenized using tree traversal to generate vocabulary, unigram statistics, and bags of words (BoW). BoWs are subjected to statistical analysis to programmatically and intelligently discover software usage patterns in production, to identify test coverage, and to flag gaps in testing.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Gao, Yifan, "Software Input Pattern and Test Coverage using Computational Linguistics on Structured Data", Technical Disclosure Commons, (September 07, 2021)