Abstract

Retrieval-augmented generation (RAG) and tool-using LLM agents routinely emit citations that purport to tie a claim to a retrieved source, but a well-known failure mode is the dangling or fabricated citation that resolves to nothing. Existing mitigations largely rely on the model policing itself or on probabilistic post-hoc fact-checking. This disclosure describes a family of deterministic, model-independent mechanisms that make citation and gap integrity an invariant of the system rather than a behavior to be hoped for: (A) a stable cross-tool evidence registry with primary/derived provenance and a detect-bounce-strip citation-lifecycle guard, including compound-citation surgery, carried across human-feedback revisions; (B) conversation recall as derived evidence with citation-namespace sanitization; (C) corpus-verification of absence/negative claims before delivery; (D) multi-point domain-entity canonicalization; (E) full-document re-grounding for enumeration steps; and (F) deterministic detection of referenced-but-unindexed source documents. Each is described in enabling detail with pseudocode and an explicit enumeration of variants intended to constitute prior art. The mechanisms are independent of the language model, retriever, and provider, and run offline; a reference implementation exists in the document-review application RAGchat. Keywords: defensive publication; prior art; retrieval-augmented generation; large language models;citation grounding; hallucination; agentic LLM; information retrieval

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS