Abstract

Conventional methods for improving generative artificial intelligence (AI) prompts can rely on manual analysis of evaluation feedback, a process that may be time-consuming, subjective, and may not scale effectively. A data-driven pipeline is described that can facilitate automated prompt refinement. The system may operate in a feedback loop, ingesting evaluation data, for example, rater scores and qualitative comments. The system can then use a large language model to perform a multi-stage analysis that may include sanitizing data, diagnosing response failures, extracting successful patterns, and resolving potentially contradictory feedback to generate a revised prompt. This approach can support a more scalable, data-driven, and repeatable process for prompt engineering, which may lead to quality improvements for generative AI model outputs.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS