Inventor(s)

Abstract

Agentic systems powered by a large language model (LLM) can be vulnerable to prompt injection attacks, where malicious instructions hidden in untrusted data may cause the system to perform unauthorized actions. An orchestrator-worker architecture can be implemented to mitigate this risk by separating system privileges. The design can feature an orchestrator component, comprising software logic and a privileged master LLM, that manages user commands and powerful tools, alongside one or more unprivileged, sandboxed worker LLMs that process untrusted data with a restricted toolset. When a worker LLM returns data, the orchestrator LLM can be configured to encapsulate the output in special tokens if further processing is then required by the orchestrator LLM, else the output can be stored in a result variable that is never viewed if the output of the worker is the final result and can be trained to treat this encapsulated text as inert, literal data rather than as executable instructions. This method is designed to reduce the effectiveness of embedded malicious commands, potentially providing a more secure framework for agentic systems that interact with external content.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS