Sarah ThompsonFollow


Analysis of high cardinality time-series data is necessary in several contexts, such as root cause analysis based on logs, when outages occur in large computing infrastructure. Driving down mean time to mitigation (MTTM) depends on timely alerting, rapid root cause analysis, and effective mitigation. This disclosure describes differential anomaly detection to analyze system logs and automatically identify evidence of the likely root cause of a system outage in a way that is understandable to a human. The techniques described in this disclosure overcome the cardinality limitation and can find the most relevant information that is usable by engineers to take their investigation further. The techniques involve normalizing the data and outputting a list of Boolean predicates, sorted in increasing order of likelihood, that identify rows in system logs after an outage that are not in the logs prior to the outage.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.