Defensive Publications Series

Improving Large Language Model Responses Using Implicit Feedback from Conversational Refinement Metrics

Tushar ChughFollow
Aditya MoneFollow
Mehrbod SharifiFollow
Karthik KumaraFollow

Abstract

Techniques for training large language models (LLMs) rely either on sparse, explicit user feedback or expensive human evaluation. These techniques fail to capture the nuanced conversational friction that users experience during multi-turn dialog. This disclosure describes techniques that, with user permission, leverage follow-up prompts entered by a user as a dense, scalable source of implicit feedback for post-training LLMs. A refinement framework classifies conversational turns following an LLM response based on the relationship between the turns. The framework categorizes user prompts into two classes - refinement (a negative signal indicating user friction and an attempt to fix a failure in the LLM response) and follow-up (a positive signal indicating that the user’s goal was met). An LLM-based classification engine categorizes conversational logs, generating diagnostic metrics and a large dataset of implicit preferences. This dataset is integrated into the LLM post-training pipeline, employing reinforcement learning (RL) techniques to fine-tune the LLM to reduce the likelihood of responses that lead to user refinements.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Chugh, Tushar; Mone, Aditya; Sharifi, Mehrbod; and Kumara, Karthik, "Improving Large Language Model Responses Using Implicit Feedback from Conversational Refinement Metrics", Technical Disclosure Commons, (October 22, 2025)
https://www.tdcommons.org/dpubs_series/8760

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Improving Large Language Model Responses Using Implicit Feedback from Conversational Refinement Metrics

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Improving Large Language Model Responses Using Implicit Feedback from Conversational Refinement Metrics

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information