Abstract

Large language models (LLMs) are increasingly embedded in enterprise, operational, and security-sensitive workflows. These include automated coding assistants, knowledge management systems, and multi-agent orchestration platforms where LLMs act as intermediaries between users and external tools. While significant research has been directed toward preventive safeguards such as input filtering, refusal mechanisms, and adversarial robustness, far less attention has been paid to forensic readiness. When incidents occur, ranging from malicious prompt injections to unauthorized tool usage or exfiltration of sensitive data, existing LLM implementations fail to provide investigators with reliable records for reconstruction, containment, or attribution.

To address this deficiency, a comprehensive system and method are proposed herein that enable forensic-grade logging for LLM-driven interactions. The system provides a detailed record of specific tampered-evident, chain-of-custody logs, recording artefacts such as prompts, responses, and tool invocations in a secure, cryptographically verifiable format. The design incorporates provenance metadata, including timestamps, model version identifiers, and execution context, to ensure that each recorded element can be reconstructed in its original sequence. In parallel, the system integrates privacy-preserving mechanisms that apply selective redaction and retention policies, balancing evidentiary completeness with compliance obligations under legal and regulatory frameworks. This dual emphasis on forensic trustworthiness and privacy protection enables organizations to not only prevent but also investigate and attribute incidents within LLM ecosystems.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS