Abstract

Ensuring that content output by generative artificial intelligence (Gen AI) models and utilized in an application delivers a quality user experience while also adhering to user safety. Examples of metrics for measuring the risk of model failure include adversarial block rate (ABR) and benign block rate (BBR). To achieve target ABR and BBR, a set of filters may be utilized to filter content output by gen AI models. This disclosure describes techniques to automatically determine suitable operational parameters for downstream filters applied to content produced by generative AI models. Filter thresholds are selected such that content safety is provided while also providing a satisfactory user experience. Per the techniques, a random search is enhanced as follows across a predetermined range of filter thresholds. Target ranges for ABR and BBR are established. Combinations of filter thresholds that yield ABR and BBR within the target range are selected, with the search range being adjusted to ensure that boundary conditions are not overlooked. With an approach that performs iterative random search and updates the search range after each iteration, random combinations are efficiently placed on a scatter plot and filter values that deliver target performance can be identified quickly.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Xiong, Jing and Hou, Yanhan, "Automatic Selection of Filter Thresholds for Generative Artificial Intelligence", Technical Disclosure Commons, (February 12, 2024)
https://www.tdcommons.org/dpubs_series/6681

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Automatic Selection of Filter Thresholds for Generative Artificial Intelligence

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Automatic Selection of Filter Thresholds for Generative Artificial Intelligence

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information