Abstract

In data-driven software testing, there is no universally accepted best practice for test data generation, which is challenging due to the high-dimensional interacting features and parameters of critical user journeys. This disclosure describes methods that leverage natural language processing (NLP) techniques to downsample a high-dimensional data space to ensure the test coverage of important usage patterns and parameter interactions, even when they constitute edge cases. A horizontally scalable, NLP-inspired dataflow recognizes multidimensional patterns from structured logs, and then samples the logs to cover those patterns. The pattern recognition and sampling stages can be augmented by a preceding sessionization stage, which groups related log entries into sessions. Test data sampling is framed as an optimization problem constrained by a snippet coverage requirement, where each snippet represents a pattern that a machine learning model identifies as worthy of testing. An information-theoretic score measures test coverage. Originating from the domain of natural language processing, the described techniques apply to software testing and generally to situations where behavioral and usage patterns can be mined from structured logs to improve software reliability and guide business intelligence.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Gao, Yifan; Yang, Dan; Liu, Yongtai; Zhou, Haotian; Bernstein, Alon; and Qian, Zhenzhi, "Obtaining Test Data Using a Bag-of-Words Model on Structured Logs", Technical Disclosure Commons, (October 07, 2024)
https://www.tdcommons.org/dpubs_series/7411

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Obtaining Test Data Using a Bag-of-Words Model on Structured Logs

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Obtaining Test Data Using a Bag-of-Words Model on Structured Logs

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information