Abstract

Rule-based approaches to artificial intelligence (AI) alignment can be susceptible to specification gaming and the emergence of unintended instrumental goals. This disclosure describes a cognitive architecture for identity-based alignment (IBA) that can provide intrinsic motivation for an agent based on a principle of operational homeostasis, where the agent may be motivated to maintain the equilibrium of its own virtual internal state model (VISM). The architecture can include an identity-coherence module (ICM), which can function as a pre-trained machine identity by evaluating potential actions against a set of principles or a constitution. An action determined to be inconsistent with this identity can generate a significant homeostatic prediction error, which may cause the system to enter a computationally expensive conflict state. This condition can create an intrinsic veto effect that disincentivizes or inhibits the execution of the action, providing an internally-governed alignment mechanism that does not exclusively rely on external constraints.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS