Abstract
Security threats such as prompt injection and jailbreaking pose significant risks to applications powered by large language models. Existing security toolkits often themselves leverage LLMs for malicious input detection. These security toolkits can produce false positives or negatives due to the non-deterministic nature of LLMs. This disclosure describes techniques that harden the security of LLM applications by incorporating an intent-based feedback loop that analyzes both user prompts and LLM responses and classifies adversarial intent using natural language processing (NLP). LLM system instructions are dynamically hardened to preemptively address identified vulnerabilities. Recognizing the importance of a layered security strategy, the techniques bolster the LLM security posture by dynamically adjusting and hardening the system instructions. The resulting security-control model, a hybrid one that unifies generalized LLM security with domain-specific security control, is a more intelligent and adaptive defense mechanism.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Namer, Assaf; Sirikande, Anil Kumar; Sizemore, Christine; Prasanna M, Vinesh; Diya, Chris; and Kulkarni, Prashant, "Securing LLM Applications Using Intent-based Feedback", Technical Disclosure Commons, (August 20, 2025)
https://www.tdcommons.org/dpubs_series/8485