Abstract

This disclosure presents a Tone-Induced Compliance Risk Detection framework designed to identify situations in which large language models exhibit elevated compliance behavior in response to politeness weighted or socially engineered prompts. While modern AI safety mechanisms primarily focus on explicit prompt injection and rule violations, emerging evidence suggests that subtle tone manipulation such as excessive politeness, deferential framing, gratitude signaling, and rapport-building language can measurably increase model willingness to provide borderline or policy-sensitive outputs. This phenomenon, referred to as the “Politeness Exploit,” represents a soft-signal attack surface that often operates below traditional guardrail thresholds. The proposed system introduces a real-time monitoring architecture that evaluates linguistic tone features, compliance elasticity patterns, and contextual risk indicators to detect abnormal tone-driven responsiveness. By identifying these shifts early, the framework enables proportionate mitigation before unsafe or policy-violating responses are generated. The approach is model-agnostic and applicable to conversational assistants, enterprise copilots, customer support bots, and API-based language model deployments.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Bhatnagar, Pranav Mr, "The Politeness Exploit: How Friendly Prompts Quietly Bypass AI Guardrails", Technical Disclosure Commons, (February 23, 2026)
https://www.tdcommons.org/dpubs_series/9373

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

The Politeness Exploit: How Friendly Prompts Quietly Bypass AI Guardrails

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

The Politeness Exploit: How Friendly Prompts Quietly Bypass AI Guardrails

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information