Abstract
Human–AI collaboration has become a central component of modern cybersecurity operations. Security analysts increasingly rely on automated systems for threat detection, risk assessment, and incident response. While this collaboration improves efficiency and scalability, it also introduces new vulnerabilities associated with human trust in machine generated recommendations. This paper introduces Trust Amplification Exploits, a novel class of adversarial strategies that manipulate and exploit overconfidence in AI-assisted security workflows. Rather than targeting algorithms directly, these attacks leverage repeated system accuracy, interface design, and organizational dependence to amplify human reliance on automated outputs. Over time, excessive trust reduces critical evaluation and increases susceptibility to subtle manipulation. We analyze the psychological and technical foundations of trust formation in human–AI teams, develop a formal threat model, and present a taxonomy of exploitation mechanisms. Through realistic case studies, we demonstrate how attackers can weaponize automation bias to undermine defensive effectiveness without triggering conventional alerts. Finally, we propose a trust integrity framework aimed at balancing automation benefits with sustained human oversight. Our findings highlight that preserving calibrated trust is essential for the long-term resilience of AI-enabled security systems.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Bhatanagar, Pranav Mr, "Trust Amplification Exploits in Human to AI Security Teams Abusing Overconfidence in Automated Recommendations", Technical Disclosure Commons, (February 09, 2026)
https://www.tdcommons.org/dpubs_series/9298