Abstract

Ensuring that content output by generative artificial intelligence (Gen AI) models and utilized in an application delivers a quality user experience while also adhering to user safety. Examples of metrics for measuring the risk of model failure include adversarial block rate (ABR) and benign block rate (BBR). To achieve target ABR and BBR, a set of filters may be utilized to filter content output by gen AI models. This disclosure describes techniques to automatically determine suitable operational parameters for downstream filters applied to content produced by generative AI models. Filter thresholds are selected such that content safety is provided while also providing a satisfactory user experience. Per the techniques, a random search is enhanced as follows across a predetermined range of filter thresholds. Target ranges for ABR and BBR are established. Combinations of filter thresholds that yield ABR and BBR within the target range are selected, with the search range being adjusted to ensure that boundary conditions are not overlooked. With an approach that performs iterative random search and updates the search range after each iteration, random combinations are efficiently placed on a scatter plot and filter values that deliver target performance can be identified quickly.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS