Defensive Publications Series

Content Injection Firewall for Detecting and Neutralizing Adversarial Instructions in Agent-Facing Recommendation Content

AnonymousFollow

Abstract

Techniques are described for an agent-mediated content injection firewall in recommendation systems. Candidate recommendation content such as descriptions, reviews, and metadata is evaluated using a dual pipeline that yields a human-oriented quality score and an injection risk score. The injection risk score may be computed as a calibrated ensemble of instruction-pattern matching, obfuscation detection, a transformer-based instruction classifier, and an adversarial judge based on behavioral divergence of a language model when conditioned on the content versus a neutralized version. The injection risk score is integrated into ranking using an agent-adjusted penalty scaled by an agent vulnerability profile to produce soft demotion of risky items. Content delivered to agent-facing APIs may be wrapped with trust metadata and instruction-hierarchy controls, optionally with integrity hashing. Provenance and seller reputation are tracked to adjust baseline risk, and a red-team loop generates and tests adversarial variants to retrain detectors over time.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Content Injection Firewall for Detecting and Neutralizing Adversarial Instructions in Agent-Facing Recommendation Content", Technical Disclosure Commons, (June 30, 2026)
https://www.tdcommons.org/dpubs_series/10739

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Content Injection Firewall for Detecting and Neutralizing Adversarial Instructions in Agent-Facing Recommendation Content

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Content Injection Firewall for Detecting and Neutralizing Adversarial Instructions in Agent-Facing Recommendation Content

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information