Abstract

Decentralized, multi-agent artificial intelligence (AI) systems can introduce structural vulnerabilities, such as emergent toxicity and alignment erosion, that existing safety paradigms may not fully address. A multi-agent safety architecture is described that can treat safety as a geometric and cryptographic property of a network's communication and consensus fabric. A method can involve mathematically defining a safety concept as a subspace within a model’s activation manifold to create an orthogonal projection operator. This operator may filter agent states to help mitigate harmful emergent content, a cryptographic egress layer with low-rank proofs can be used to verify the filter's application before messages are broadcast, and a topological consensus mechanism may adjudicate disputes based on proximity to a geometric safety anchor. This architecture can provide a distributed framework to help mitigate risks including emergent toxicity, alignment erosion, and Sybil dominance in continuously learning AI swarms.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Mohbe, Neel, "Multi-Agent Safety Architecture Using Geometric Projections and Cryptographic Egress Control", Technical Disclosure Commons, (May 25, 2026)
https://www.tdcommons.org/dpubs_series/10229

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Multi-Agent Safety Architecture Using Geometric Projections and Cryptographic Egress Control

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Multi-Agent Safety Architecture Using Geometric Projections and Cryptographic Egress Control

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information