Failure Classification for Microservice Systems Based on Variational Graph Auto-Encoders

  • Wu Sun
  • , Panfeng Chen
  • , Mei Chen
  • , Hui Li*
  • , Yanhao Wang
  • , Gang Huang
  • , Hongyuan Li
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Failure classification (FC) is a crucial problem in microservice systems, as it enables precise failure location, reduces the mean time to repair (MTTR), and ensures that service level agreements (SLAs) are maintained. However, existing methods for FC mostly rely on independent anomaly detectors or cascade feature extraction modules to handle multimodal monitoring data (e.g., logs, metrics, and traces), which suffer from error accumulation and amplification over multi-stage pipelines, leading to suboptimal performance. To address this issue, we propose FC-VGAE, a new failure classification method based on the variational graph auto-encoder with multimodal data fusion and joint feature extraction. Specifically, it first builds microservice invocation graphs (MIGs) from monitoring data. Then, it utilizes a semi-supervised VGAE to capture the normal behavior of the microservice system and produces the reconstruction errors for all nodes in MIGs, which are fed into a multi-layer perceptron (MLP) to classify the failure types. Finally, we evaluate FC-VGAE on two large-scale real-world microservice datasets. The results show that FC-VGAE improves over state-of-the-art baseline methods by about 21% and 19%, respectively, in F1-scores on the two datasets, validating its superiority for microservice failure classification.

Original languageEnglish
Title of host publicationService-Oriented Computing - 23rd International Conference, ICSOC 2025, Proceedings
EditorsMarco Aiello, Ilche Georgievski, Shuiguang Deng, Juan-Manuel Murillo, Boualem Benatallah, Zhongjie Wang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages189-204
Number of pages16
ISBN (Print)9789819550111
DOIs
StatePublished - 2026
Event23rd International Conference on Service-Oriented Computing, ICSOC 2025 - Shenzhen, China
Duration: 1 Dec 20254 Dec 2025

Publication series

NameLecture Notes in Computer Science
Volume16320 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Service-Oriented Computing, ICSOC 2025
Country/TerritoryChina
CityShenzhen
Period1/12/254/12/25

Keywords

  • Failure classification
  • Graph neural networks
  • Microservice systems
  • Multimodal data fusion

Fingerprint

Dive into the research topics of 'Failure Classification for Microservice Systems Based on Variational Graph Auto-Encoders'. Together they form a unique fingerprint.

Cite this