Defensive Publications Series

SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR EVALUATING GENERATIVE MACHINE LEARNING MODELS

Abstract

Systems, methods, and computer program products are provided for evaluating generative machine learning models. An example system includes a processor configured to determine queries and ground-truth answers associated with the queries. The processor is also configured to generate generated answers based on the queries using a model that is being evaluated. The processor is further configured to input the queries, the ground-truth answers, and the generated answers to an evaluator model trained to evaluate the generated answers in comparison to the ground-truth answers and based on a grading scale associated with accuracy, honesty, and completeness of a generated answer. The processor is further configured to determine scores associated with the generated answers, reject a first subset of answers based on a first subset of scores, and provide a second subset of answers based on a second subset of scores.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Wang, Yang; Hernandez, Alberto Garcia; Kyslyi, Roman; Kersting, Nicholas; Patil, Ajit Vilasrao; and Dutta, Ranjan, "SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR EVALUATING GENERATIVE MACHINE LEARNING MODELS", Technical Disclosure Commons, (June 03, 2025)
https://www.tdcommons.org/dpubs_series/8187

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR EVALUATING GENERATIVE MACHINE LEARNING MODELS

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR EVALUATING GENERATIVE MACHINE LEARNING MODELS

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information