Abstract
Language adherence is a metric that measures whether the responses of an artificial intelligence (AI) model adhere to the user’s desired language(s). Language adherence can be a nuanced aspect of AI-human interaction because the expected language of the model’s response need not be the same as the language used in the user’s prompt. This disclosure describes techniques that enable a general-purpose large language model (LLM) to operate as an accurate, context-aware, and specialized auto-rater. The auto-rater, which notably does not need model retraining, evaluates language adherence. The techniques include a framework for automated language evaluation based on a pipeline with the following components: evaluation data with manually labeled ground-truth language annotations; an automated rater model using an LLM configured with prompts engineered for language-adherence evaluation; and post-deployment human-in-the-loop verification to cross-validate the quality of the automated rater against human raters on evaluation data.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Liu, Hongbin; Chen, Mingqing; and Wang, Lun, "High-quality Language Adherence Evaluation Using a Large Language Model with Ground-truth Language Annotations", Technical Disclosure Commons, (October 10, 2025)
https://www.tdcommons.org/dpubs_series/8706