This disclosure describes computational linguistics techniques for software input patterns and test coverage. Structured input data which can have arbitrary and evolving schema, obtained from production software and from testbeds, are tokenized using tree traversal to generate vocabulary, unigram statistics, and bags of words (BoW). BoWs are subjected to statistical analysis to programmatically and intelligently discover software usage patterns in production, to identify test coverage, and to flag gaps in testing.

