Inventor(s)

Ritvik Shrivastava

Abstract

Guardrails are a set of limitations, guidelines, and operational protocols designed to govern the behavior and outputs of Large Language Models (LLMs). Current guardrail creation methods often face limitations such as lack of transparency, overly restrictive rules, and difficulty keeping pace with the evolving threat landscape. To overcome these limitations, techniques are proposed herein that provide automation for the generation of guardrails, or safeguarding rules, for LLMs using Reinforcement Learning (RL).

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS