Abstract

Systems and methods are described that may link a user account's risk profile to the content it generates. The technology can assess account behavior to generate a risk score and can embed corresponding digital watermarks into AI-generated content. For example, an origin watermark may identify the content as AI-generated, while a safety watermark may encode information related to potential risks based on the account's score and the user's prompt. This approach may allow downstream platforms to detect these watermarks, which can enable them to apply context-aware moderation, such as displaying informational labels or providing user-configurable controls, facilitating safety enforcement at the point of content distribution.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS