Abstract
The present disclosure provides a method and a system for scalable query sampling for large and imbalanced data. The method comprises determining the feature families in the data where each family of the features consists of correlated features. The method further comprises determining importance scores for the features in the data to filter the features that contribute significantly to the analysis of the agent system. Further, the method comprises determining the deviation effect scores for all the features of the corresponding queries and aggregating the deviation effect scores for the queries to determine a selection value. The sample of queries are determined based on the selection value.
Publication Date
2026-01-07
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
KOSAN, MERT; SONAR, CHINMAY NARENDRA; LIU, CAN; AGRAWAL, SHUBHAM; and CHETIA, CHIRANJEET, "System And Method For Scalable Query Sampling for Large and Imbalanced Data", Technical Disclosure Commons, (January 08, 2026)
https://www.tdcommons.org/dpubs_series/9152