Abstract
Systems and methods are described for automated content integrity assessment using a two-tier calibration and operation workflow. Domain experts encode assessment criteria into structured prompts, supply exemplar calibration cases spanning decision boundaries, review large language model (LLM) outputs, iteratively revise prompts to address systematic errors, and define post-assessment quality rules that capture known failure modes. After calibration, the LLM operates autonomously to label candidate instances for measurement, producing structured outputs that include a decision and a reasoning trace. Candidate instances may include entity pairs with multimodal content portfolios, with requests constructed under input constraints such as a maximum number of videos. Candidates are selected using stratified sampling combining higher-confidence candidates with random sampling to support unbiased prevalence estimation. Guardrails filter outputs before measurement. Baseline prevalence and prevalence reduction may be computed using victim cohort definitions and victim-level holdouts comparing treatment and control prevalence.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Anonymous, "Tiered Expert Calibration Framework for Autonomous Large Language Model Assessment Systems", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10649