Machine learning-based run-time anomaly detection in software systems: An industrial evaluation

Fabian Huch, Mojdeh Golagha, Ana Petrovska, Alexander Krauss

Abstract

Anomalies are an inevitable occurrence while operating enterprise software systems. Traditionally, anomalies are detected by threshold-based alarms for critical metrics, or health probing requests. However, fully automated detection in complex systems is challenging, since it is very difficult to distinguish truly anomalous behavior from normal operation. To this end, the traditional approaches may not be sufficient. Thus, we propose machine learning classifiers to predict the system’s health status. We evaluated our approach in an industrial case study, on a large, real-world dataset of 7.5*10^6 data points for 231 features. Our results show that recurrent neural networks with long short-term memory (LSTM) are more effective in detecting anomalies and health issues, as compared to other classifiers. We achieved an area under precision-recall curve of 0.44. At the default threshold, we can automatically detect 70% of the anomalies. Despite the low precision of 31 %, the rate in which false positives occur is only 4 %.

Type

Conference

Publication

2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)

Date

2018

Links

PDF DOI