Inventor(s)

Abstract

Systems and methods are described for integrity verification of AI agent preference memory. Preference entries include semantic content, confidence, timestamp, source interaction identifier, and reinforcement history, and are associated with cryptographic provenance signatures and interaction context hashes. A preference consistency graph computes embedding-based consistency weights between preferences and produces anomaly scores for candidate preferences based on contradictions with stored high-confidence preferences. Confidence values may decay over time and be re-verified using subsequent behavior and a multi-source corroboration ladder. The system creates cryptographically signed checkpoints and performs targeted rollback to surgically remove unverifiable or anomalous preference entries while preserving verified entries, optionally re-deriving preferences from an interaction log. A platform-side attestation protocol cross-references agent-provided preference provenance against an independent platform interaction log and may serve non-personalized outputs when verification fails.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS