Closed-Loop System for Automated Prompt Refinement Based on Large Language Model Evaluation Feedback
Abstract
Some Large Language Model (LLM) evaluation frameworks can provide performance metrics and diagnostic information but may not offer a mechanism to translate these insights into specific, actionable improvements. This circumstance can lead developers to manually interpret evaluation results and devise solutions, such as refining system prompts. A disclosed method relates to a closed-loop system that can be used to automate aspects of this process. In some configurations, the system identifies low-performing outputs from an evaluation. An LLM-based component, which may be called an insight generator can then analyze these failures to produce structured action items. A prompt tuner module can subsequently use these action items to iteratively modify the original system prompt. Following a modification, new outputs may be generated and re-evaluated in a cycle, with the objective of meeting a desired performance goal. This process can serve as a method for improving LLM performance by refining prompts based on empirical feedback.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Seyfi, Ali and Fang, Fang (Anna), "Closed-Loop System for Automated Prompt Refinement Based on Large Language Model Evaluation Feedback", Technical Disclosure Commons, (October 14, 2025)
https://www.tdcommons.org/dpubs_series/8716