Abstract

This disclosure describes a system for automated regression testing of large language models (LLMs). LLMs can be trained or fine-tuned with safety-oriented data and similar techniques to prevent undesirable outputs. However, additional training and fine-tuning can result in an LLM regressing and producing undesirable outputs. These regressions can be difficult to detect and track given the fast pace of research and development in this field. This system provides automated infrastructure and techniques for testing new versions of LLMs for regression against stored tests. The system includes integrator modules that provide a uniform testing interface for various types of LLMs and LLM interfaces. The system further includes a testing engine that can perform logic- and rules-based tests against LLMs under test as well as LLM-based tests using a verification LLM prompted to analyze outputs of LLMs under test. The system can aggregate test results and perform various actions such as issuing notifications or integrating with a CI/CD pipeline.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Criscione, Claudio and Lekies, Sebastian, "Automated Regression Testing Framework for Large Language Models", Technical Disclosure Commons, (December 11, 2024)
https://www.tdcommons.org/dpubs_series/7641

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Automated Regression Testing Framework for Large Language Models

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Automated Regression Testing Framework for Large Language Models

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information