Log files and reports describing outages and bugs include free-form text fields that include information about system outages or bugs in a computer system. However, identifying patterns such as common types of bugs, affected users, critical components or binaries, etc. manually from such documents can be tedious. This disclosure describes techniques to automatically categorize documents in multiple relevant categories based on the key features extracted from free-form text information in the documents. The extracted features are subjected to dimensionality reduction. Clustering techniques are applied to automatically identify clusters that are then used to generate visualizations. In the context of documents describing system outages or bugs, the cluster information can be used to automatically categorize the incidents into applicable categories. The techniques can help in early mitigation of system outages and can save substantial manual effort.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.