Abstract
Autonomous remediation agents increasingly hold credentials to change production infrastructure (DNS, email authentication, CDN and TLS configuration) in response to machine-generated findings. The dominant assurance model for such systems is auditable: actions are logged for after-the-fact human review. We argue this is insufficient when the actor is an LLM-driven agent whose proposals are untrusted text, and we present VSAR (Verifiable Safe Autonomous Remediation), a safety kernel that makes every remediation verifiable instead of merely auditable. VSAR gates each proposed change through a fixed set of eight predicates, including a target-binding predicate that ties the change to the exact asset the finding was raised against, and emits a per-remediation verdict signed with EdDSA that a third party can verify in software, without trusting the vendor and without specialized hardware. We formally verify the kernel's decision logic in TLA+/TLC, establishing eight non-vacuous invariants (fail-closed on enforce, target-binding, kill-switch dominance mapped to EU AI Act Art. 14, and an autonomy gate), and report on a production deployment that transitioned from observe mode to enforce mode on the basis of operational data, with zero false positives across the observation window, in which the kernel caught a real wrong-target remediation that motivated the target-binding predicate. We position VSAR within a rapidly growing 2025-2026 literature on verifiable agent safety, and identify its contribution as the specific, deployed combination of remediation-domain predicates, software-only third-party-verifiable certificates, and regulation-mapped formal invariants.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Garcia Alonso, Inaki, "VSAR: A Verifiable, Formally-Verified Safety Kernel for Autonomous Security Remediation", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10462