Abstract
The rapid adoption of artificial intelligence within cybersecurity has fundamentally altered both defensive and offensive security practices. In recent years, AI driven tools have increasingly been deployed not only to protect systems, but also to test them. Automated scanners, autonomous red teaming agents, and AI assisted penetration testing platforms are now capable of probing complex infrastructures at a speed and scale that far exceed human capability. At the same time, modern production environments themselves are becoming increasingly dependent on AI based components, including adaptive intrusion detection systems, behavioural firewalls, anomaly detection engines, and large language model driven control interfaces. This convergence has given rise to a new and largely unexamined phenomenon: AI systems actively testing, probing, and exploiting other AI systems. This paper introduces and analyses the concept of AI based pentesting of AI systems, focusing on the emergence of recursive security failures that occur when machine driven attackers and machine-driven defenders interact continuously without effective human mediation. Unlike traditional penetration testing, which is episodic, human led, and constrained by time and expertise, AI based pentesting operates continuously and adaptively. When defensive systems respond dynamically to automated probing, they may inadvertently expose new attack surfaces, alter decision boundaries, or destabilize previously secure configurations. These changes can then be further exploited by attacking AI systems, creating feedback loops that evolve faster than human operators can observe, understand, or control. The central argument of this paper is that recursive AI-on-AI interaction represents a structural break from established security testing models. Failures in such environments are not linear, isolated, or easily attributable. Instead, they emerge gradually through adaptive interaction, often without triggering explicit alerts or policy violations. Human oversight mechanisms, designed for slower and more interpretable workflows, become increasingly symbolic rather than effective. By the time anomalous behaviour is detected, the underlying system state may have already shifted in response to ongoing automated interaction, making root cause analysis and remediation exceptionally difficult. Through a detailed examination of AI driven pentesting tools, adaptive defensive systems, and real-world deployment patterns, this work demonstrates how traditional assumptions about control, accountability, and remediation break down under recursive pressure. The paper explores how speed asymmetry between machines and humans amplifies risk, how feedback loops introduce instability, and how existing governance frameworks struggle to assign responsibility when failures arise from emergent machine behaviour rather than explicit human action. Rather than framing AI based pentesting as an inevitable improvement in security posture, this paper argues that it introduces a new class of systemic risk that must be explicitly addressed. Without rethinking how security testing, oversight, and governance are structured in AI rich environments, organizations risk deploying systems that fail faster than humans can intervene. This work aims to contribute to an emerging discussion on AI on-AI security dynamics, highlighting the need for new defensive paradigms that acknowledge recursion, adaptation, and the limits of human in the loop assumptions.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Bhatanagar, Pranav Mr, "AI-Based Pentesting of AI Systems Recursive Security Failures", Technical Disclosure Commons, (February 09, 2026)
https://www.tdcommons.org/dpubs_series/9292