Abstract

The present disclosure discloses a method and a system for managing faults in a distributed environment 102. In the present disclosure, the method includes monitoring health metrics of systems 106 in the distributed environment 102. Further, the method includes detecting faults associated with the systems 106 in the distributed environment 102 by identifying abnormal patterns based on monitored data. Furthermore, the method includes reconfiguring the distributed environment 102 to maintain system resilience and performance based on fault detection. Further, the method includes determining a recovery action based on severity of faults. Furthermore, the method includes analyzing and diagnosing issues by logging and auditing the faults.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS