2024-10-16
2024-08-20
2024-07-22
Abstract— In modern cloud computing systems, hundreds and even thousands of cloud servers are interconnected by multi-layer networks. In such large-scale and complex systems, failures are common. Proactive failure management is a crucial technology to characterize system behaviors and forecast failure dynamics in the cloud. To make failure predictions, we need to monitor the system execution and collect health-related runtime performance data. However, in newly deployed or managed cloud systems, these data are usually unlabeled. Supervised learning based approaches are not suitable in this case. In this paper, we present an unsupervised failure detection method using an ensemble of Bayesian models. It characterizes normal execution states of the system and detects anomalous behaviors. After the anomalies are verified by system administrators, labeled data are available. Then, we apply supervised learning based on decision tree classifiers to predict future failure occur¬rences in the cloud. Experimental results in an institute-wide cloud computing system show that our methods can achieve high true positive rate and low false positive rate for proactive failure management.
Index Terms— Cloud computing, Dependability assurance, Failure prediction, Unsupervised and supervised learning, Ensemble of Bayesian models, Decision tree. Cite:Qiang Guan, Ziming Zhang, and Song Fu, "Ensemble of Bayesian Predictors and Decision Trees for Proactive Failure Management in Cloud Computing Systems," Journal of Communications, vol. 7, no.1, pp.52-61, 2012. Doi: 10.4304/jcm.7.1.52-61 Cite:Roland Bless, Martin Röhricht, and Christoph Werle, "Authenticated Quality-of-Service Signaling for Virtual Networks," Journal of Communications, vol. 7, no.1, pp.17-27, 2012. Doi: 10.4304/jcm.7.1.17-27Cite:Roland Bless, Martin Röhricht, and Christoph Werle, "Authenticated Quality-of-Service Signaling for Virtual Networks," Journal of Communications, vol. 7, no.1, pp.17-27, 2012. Doi: 10.4304/jcm.7.1.17-27Cite:Roland Bless, Martin Röhricht, and Christoph Werle, "Authenticated Quality-of-Service Signaling for Virtual Networks," Journal of Communications, vol. 7, no.1, pp.17-27, 2012. Doi: 10.4304/jcm.7.1.17-27