TY - JOUR
T1 - Noise Matters
T2 - 51st International Conference on Very Large Data Bases, VLDB 2025
AU - Zhuang, Zhihao
AU - Zhang, Yingying
AU - Zhao, Kai
AU - Guo, Chenjuan
AU - Yang, Bin
AU - Wen, Qingsong
AU - Fan, Lunting
N1 - Publisher Copyright:
© Publication rights licensed to the VLDB Endowment.
PY - 2025
Y1 - 2025
N2 - Flink clusters often suffer from hotspot issues where the monitored job delay and CPU usage keep rising and remain high. This necessitates the detection of anomalous time series to pinpoint the hotspot machines. However, the state-of-the-art unsupervised time series anomaly detection (UTAD) methods are ineffective in this scenario. We identify two main reasons for this. First, the hotspot scenario requires us to pay particular attention to Flink-specific anomalies, e.g., slow-rising and high-level anomalies, which the existing methods struggle to address. Second, the state-of-the-art anomaly detection methods often assume that training datasets do not contain anomalies, but the data collected from the running Flink clusters contains noise, which causes these methods to learn anomalous patterns as normal patterns. In this paper, we first conduct experiments to analyze why existing methods fail in the Flink scenario. To tackle these challenges, we propose a cross-contrastive approach to learn the context information for each timestamp to enable Flink-specific anomaly detection. Then, to address noisy anomalies, we incorporate prior knowledge to set an anomaly boundary to prevent the model from learning anomalous patterns. Extensive experiments show that our method not only outperforms existing methods in the Flink scenario but also achieves state-of-the-art results on public benchmark datasets.
AB - Flink clusters often suffer from hotspot issues where the monitored job delay and CPU usage keep rising and remain high. This necessitates the detection of anomalous time series to pinpoint the hotspot machines. However, the state-of-the-art unsupervised time series anomaly detection (UTAD) methods are ineffective in this scenario. We identify two main reasons for this. First, the hotspot scenario requires us to pay particular attention to Flink-specific anomalies, e.g., slow-rising and high-level anomalies, which the existing methods struggle to address. Second, the state-of-the-art anomaly detection methods often assume that training datasets do not contain anomalies, but the data collected from the running Flink clusters contains noise, which causes these methods to learn anomalous patterns as normal patterns. In this paper, we first conduct experiments to analyze why existing methods fail in the Flink scenario. To tackle these challenges, we propose a cross-contrastive approach to learn the context information for each timestamp to enable Flink-specific anomaly detection. Then, to address noisy anomalies, we incorporate prior knowledge to set an anomaly boundary to prevent the model from learning anomalous patterns. Extensive experiments show that our method not only outperforms existing methods in the Flink scenario but also achieves state-of-the-art results on public benchmark datasets.
UR - https://www.scopus.com/pages/publications/105008757215
U2 - 10.14778/3717755.3717773
DO - 10.14778/3717755.3717773
M3 - 会议文章
AN - SCOPUS:105008757215
SN - 2150-8097
VL - 18
SP - 1159
EP - 1168
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 4
Y2 - 1 September 2025 through 5 September 2025
ER -