Defensive Publications Series

Agent Preference Memory Integrity Verification System for Detecting and Recovering from Preference Poisoning Attacks

AnonymousFollow

Abstract

Systems and methods are described for integrity verification of AI agent preference memory. Preference entries include semantic content, confidence, timestamp, source interaction identifier, and reinforcement history, and are associated with cryptographic provenance signatures and interaction context hashes. A preference consistency graph computes embedding-based consistency weights between preferences and produces anomaly scores for candidate preferences based on contradictions with stored high-confidence preferences. Confidence values may decay over time and be re-verified using subsequent behavior and a multi-source corroboration ladder. The system creates cryptographically signed checkpoints and performs targeted rollback to surgically remove unverifiable or anomalous preference entries while preserving verified entries, optionally re-deriving preferences from an interaction log. A platform-side attestation protocol cross-references agent-provided preference provenance against an independent platform interaction log and may serve non-personalized outputs when verification fails.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Agent Preference Memory Integrity Verification System for Detecting and Recovering from Preference Poisoning Attacks", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10741

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Agent Preference Memory Integrity Verification System for Detecting and Recovering from Preference Poisoning Attacks

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Agent Preference Memory Integrity Verification System for Detecting and Recovering from Preference Poisoning Attacks

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information