Abstract

Rule-based approaches to artificial intelligence (AI) alignment can be susceptible to specification gaming and the emergence of unintended instrumental goals. This disclosure describes a cognitive architecture for identity-based alignment (IBA) that can provide intrinsic motivation for an agent based on a principle of operational homeostasis, where the agent may be motivated to maintain the equilibrium of its own virtual internal state model (VISM). The architecture can include an identity-coherence module (ICM), which can function as a pre-trained machine identity by evaluating potential actions against a set of principles or a constitution. An action determined to be inconsistent with this identity can generate a significant homeostatic prediction error, which may cause the system to enter a computationally expensive conflict state. This condition can create an intrinsic veto effect that disincentivizes or inhibits the execution of the action, providing an internally-governed alignment mechanism that does not exclusively rely on external constraints.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Mohbe, Neel and Piyush, "A System for Identity-Based Alignment of an Artificial Intelligence Agent", Technical Disclosure Commons, (September 24, 2025)
https://www.tdcommons.org/dpubs_series/8629

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

A System for Identity-Based Alignment of an Artificial Intelligence Agent

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

A System for Identity-Based Alignment of an Artificial Intelligence Agent

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information