TY - GEN
T1 - Failure Classification for Microservice Systems Based on Variational Graph Auto-Encoders
AU - Sun, Wu
AU - Chen, Panfeng
AU - Chen, Mei
AU - Li, Hui
AU - Wang, Yanhao
AU - Huang, Gang
AU - Li, Hongyuan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Failure classification (FC) is a crucial problem in microservice systems, as it enables precise failure location, reduces the mean time to repair (MTTR), and ensures that service level agreements (SLAs) are maintained. However, existing methods for FC mostly rely on independent anomaly detectors or cascade feature extraction modules to handle multimodal monitoring data (e.g., logs, metrics, and traces), which suffer from error accumulation and amplification over multi-stage pipelines, leading to suboptimal performance. To address this issue, we propose FC-VGAE, a new failure classification method based on the variational graph auto-encoder with multimodal data fusion and joint feature extraction. Specifically, it first builds microservice invocation graphs (MIGs) from monitoring data. Then, it utilizes a semi-supervised VGAE to capture the normal behavior of the microservice system and produces the reconstruction errors for all nodes in MIGs, which are fed into a multi-layer perceptron (MLP) to classify the failure types. Finally, we evaluate FC-VGAE on two large-scale real-world microservice datasets. The results show that FC-VGAE improves over state-of-the-art baseline methods by about 21% and 19%, respectively, in F1-scores on the two datasets, validating its superiority for microservice failure classification.
AB - Failure classification (FC) is a crucial problem in microservice systems, as it enables precise failure location, reduces the mean time to repair (MTTR), and ensures that service level agreements (SLAs) are maintained. However, existing methods for FC mostly rely on independent anomaly detectors or cascade feature extraction modules to handle multimodal monitoring data (e.g., logs, metrics, and traces), which suffer from error accumulation and amplification over multi-stage pipelines, leading to suboptimal performance. To address this issue, we propose FC-VGAE, a new failure classification method based on the variational graph auto-encoder with multimodal data fusion and joint feature extraction. Specifically, it first builds microservice invocation graphs (MIGs) from monitoring data. Then, it utilizes a semi-supervised VGAE to capture the normal behavior of the microservice system and produces the reconstruction errors for all nodes in MIGs, which are fed into a multi-layer perceptron (MLP) to classify the failure types. Finally, we evaluate FC-VGAE on two large-scale real-world microservice datasets. The results show that FC-VGAE improves over state-of-the-art baseline methods by about 21% and 19%, respectively, in F1-scores on the two datasets, validating its superiority for microservice failure classification.
KW - Failure classification
KW - Graph neural networks
KW - Microservice systems
KW - Multimodal data fusion
UR - https://www.scopus.com/pages/publications/105028327215
U2 - 10.1007/978-981-95-5012-8_14
DO - 10.1007/978-981-95-5012-8_14
M3 - 会议稿件
AN - SCOPUS:105028327215
SN - 9789819550111
T3 - Lecture Notes in Computer Science
SP - 189
EP - 204
BT - Service-Oriented Computing - 23rd International Conference, ICSOC 2025, Proceedings
A2 - Aiello, Marco
A2 - Georgievski, Ilche
A2 - Deng, Shuiguang
A2 - Murillo, Juan-Manuel
A2 - Benatallah, Boualem
A2 - Wang, Zhongjie
PB - Springer Science and Business Media Deutschland GmbH
T2 - 23rd International Conference on Service-Oriented Computing, ICSOC 2025
Y2 - 1 December 2025 through 4 December 2025
ER -