Abstract
Systems, methods, and computer program products are provided for evaluating generative machine learning models. An example system includes a processor configured to determine queries and ground-truth answers associated with the queries. The processor is also configured to generate generated answers based on the queries using a model that is being evaluated. The processor is further configured to input the queries, the ground-truth answers, and the generated answers to an evaluator model trained to evaluate the generated answers in comparison to the ground-truth answers and based on a grading scale associated with accuracy, honesty, and completeness of a generated answer. The processor is further configured to determine scores associated with the generated answers, reject a first subset of answers based on a first subset of scores, and provide a second subset of answers based on a second subset of scores.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Wang, Yang; Hernandez, Alberto Garcia; Kyslyi, Roman; Kersting, Nicholas; Patil, Ajit Vilasrao; and Dutta, Ranjan, "SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR EVALUATING GENERATIVE MACHINE LEARNING MODELS", Technical Disclosure Commons, (June 03, 2025)
https://www.tdcommons.org/dpubs_series/8187