TY - JOUR
T1 - Concept Drift Adaptation for Time Series Anomaly Detection via Transformer
AU - Ding, Chaoyue
AU - Zhao, Jing
AU - Sun, Shiliang
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/6
Y1 - 2023/6
N2 - Time series anomaly detection (TSAD) is an essential task in practical applications, such as data monitoring and network security detection. A common approach for anomaly detection is to use sequential models. As an effective sequence model, Transformer can capture the long-term dependence of the time series and is expected to better complete anomaly detection tasks. However, there are still problems to be addressed when using Transformer for anomaly detection. (1) Failing to adapt to concept drift: The vanilla Transformer assumes that the training and test data come from the same distribution. However, practical situations may often violate this assumption due to the time-varying nature of time-series data that may lead to concept drift problems. (2) High computational complexity: The time complexity of vanilla Transformer in the inference stage increase quadratically with the sequence length L. To solve the first problem, we propose the concept drift adaptation method (CDAM), a kind of distribution adaptation method, to dynamic tuning the learning rate of Transformer. CDAM aims to fully utilize the old concept data to optimize a new model on the new concept data through an online learning strategy. To address the second problem, we propose the root square sparse self-attention, which requires only O(LL) time complexity. Experimental results on several anomaly detection benchmarks show that our model outperforms many anomaly detection methods, especially in time series with concept drift.
AB - Time series anomaly detection (TSAD) is an essential task in practical applications, such as data monitoring and network security detection. A common approach for anomaly detection is to use sequential models. As an effective sequence model, Transformer can capture the long-term dependence of the time series and is expected to better complete anomaly detection tasks. However, there are still problems to be addressed when using Transformer for anomaly detection. (1) Failing to adapt to concept drift: The vanilla Transformer assumes that the training and test data come from the same distribution. However, practical situations may often violate this assumption due to the time-varying nature of time-series data that may lead to concept drift problems. (2) High computational complexity: The time complexity of vanilla Transformer in the inference stage increase quadratically with the sequence length L. To solve the first problem, we propose the concept drift adaptation method (CDAM), a kind of distribution adaptation method, to dynamic tuning the learning rate of Transformer. CDAM aims to fully utilize the old concept data to optimize a new model on the new concept data through an online learning strategy. To address the second problem, we propose the root square sparse self-attention, which requires only O(LL) time complexity. Experimental results on several anomaly detection benchmarks show that our model outperforms many anomaly detection methods, especially in time series with concept drift.
KW - Concept drift
KW - Time series anomaly detection
KW - Transfer learning
KW - Transformer
UR - https://www.scopus.com/pages/publications/85138271124
U2 - 10.1007/s11063-022-11015-0
DO - 10.1007/s11063-022-11015-0
M3 - 文章
AN - SCOPUS:85138271124
SN - 1370-4621
VL - 55
SP - 2081
EP - 2101
JO - Neural Processing Letters
JF - Neural Processing Letters
IS - 3
ER -