Abstract

Proposed herein is a system to facilitate fault detection and root cause analysis (RCA) in distributed systems using a network of artificial intelligence (AI) agents that can work together to perform distributed debugging. The proposed system employs a hybrid architecture of centralized and localized AI agents, each co-located with network devices, enabling near-real-time, context-aware analysis of logs, traces, ring-buffers and other telemetry at the point of generation.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS