Abstract

In distributed computing environments with multiple interacting agents, interdependencies can contribute to localized agent failures escalating into system-wide outages, as some protocols may not adequately define dependency-based failover strategies. A control plane framework can manage such interdependencies by employing an agent relationship tree (ART). The ART can be a data structure that maps the relationships between agents and annotates each with a dependency type, for example, solid for dependencies designated as strongrelationship or fluid for those with lower importance. An agent orchestrator can monitor agent health and, upon detecting a failure, can consult the ART to initiate a context-aware recovery action based on the defined dependency type. This method can provide a structured failover mechanism that may improve overall system resilience and help maintain operational continuity by containing the impact of partial system failures.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS