In the Observability domain, metric, event, log and trace (MELT) are basic data types generated by the infrastructure and applications. These datasets are not only ingested at high volume and high frequency but also related. Currently, available solutions are for individual data types, i.e., metric monitoring, log analytics, trace flow analysis, etc. These solutions do not provide a holistic view of the entire environment with MELT correlation. To address these types of challenges, techniques are presented herein that support a scalable, flexible, dynamic, and adaptive noise reduction system. While the system is running as expected, data is collected at a lower frequency. When the first sign of trouble appears, such a system may automatically increase collection frequency for change point detection, anomaly detection, log pattern detection, and causal inference. Aspects of the presented techniques employ a two-phase filtering mechanism comprising Edge Processors and Global Processors to intelligently apply machine learning techniques to scale up and down monitoring and root cause analysis capabilities.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.