Abstract

This publication describes a helpfulness evaluation framework that utilizes large language models to evaluate the relevance and completeness of automated support interaction transcripts. By interpreting conversation logs instead of relying solely on user feedback, the framework provides an objective metric for helpfulness and efficient resolution while avoiding the typical biases found in manual surveys. The system analyzes user requests, assesses generated answers against specific criteria, and provides structured justifications along with failure categorization tags to identify and resolve specific service issues.

Keywords: large language model, support interaction, evaluation framework, helpfulness, automated scoring, customer support, sentiment analysis, session transcript, natural language processing.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS