Abstract

Generative AI service providers provide access to image generation tools under contractual terms and conditions that require that such tools are used in a compliant way. GenAI service providers can implement guardrails and safety mechanisms to ensure compliance with their terms and conditions, and applicable regulations. However, it is possible for users to bypass such mechanisms and cause a genAI model to generate images that violate the policies of the service provider and/or applicable regulations. This disclosure describes a vision language model (VLM) that detects AI-generated images that violate specified policies. Using preamble-based policy guidelines, the VLM analyzes a generated image and a corresponding input prompt to identify policy violations. The VLM uses safety discriminators, text classifiers, image classifiers, model pushbacks, regex takedowns, etc. to ensure policy violative images are not returned to users. The VLM advantageously eliminates dependence on external classifiers, improves precision/recall performance, can respond rapidly to live violations, and can be deployed in a unified manner across the service offerings of a genAI service provider.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS