Abstract
Artificial intelligence systems, particularly large language models, are now routinely used in operational cybersecurity environments. They assist analysts with alert triage, incident response, threat intelligence interpretation, and day-to-day decision-making in situations where information is incomplete and time pressure is constant. Much of the existing research on AI security has focused on direct attacks such as prompt injection, jailbreaks, or data poisoning. In contrast, this paper examines a quieter but increasingly consequential risk: the impact of confident hallucinations in security-critical contexts. When AI systems provide incorrect guidance with high confidence, the resulting decisions can materially affect defensive posture even when no policy violation or system compromise has occurred. This work introduces the concept of hallucination-driven exploits, a class of failures and adversarial outcomes that arise from normal AI operation under uncertainty. Rather than manipulating the model directly, these exploits leverage false confidence and automation bias within human–AI decision processes. The paper analyzes how hallucinations manifest in cybersecurity settings, why confidence amplifies their impact, and why existing technical and governance approaches struggle to address this form of risk.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Bhatanagar, Pranav Mr, "HALLUCINATION-DRIVEN EXPLOITS: WEAPONIZING AI FALSE CONFIDENCE IN CYBERSECURITY SYSTEMS", Technical Disclosure Commons, (February 09, 2026)
https://www.tdcommons.org/dpubs_series/9291