摘要
Pattern matching over big data is gaining momentum in recent years. Many real-time applications are involved in pattern matching over a high volume of data to discover potential tendencies, in which real-time response and concurrent processing are the key performance metrics. However, it is challenging to efficiently match over live streaming data due to: (i) the high volume of massive data, (ii) the real-time response requirement, and (iii) the concurrent matching queries. To address these challenges, we introduce a pattern model by appending a timestamp set to reduce the number of repeated patterns and propose FastPM, a distributed stream processing framework to address the high speed real-time data. Our framework combines synchronous and asynchronous mechanisms to deal with multiple matching queries simultaneously, and develops multiple techniques to enhance the efficiency of pattern matching. We implement FastPM and evaluate its performance on billions of real-world web-click data. Our empirical results demonstrate the effectiveness of FastPM on matching queries and pattern updates. On average, FastPM responds to a matching query in 0.2 s and to an update request in 0.03 s. Furthermore, FastPM is able to support 5000 matching queries simultaneously and the average query latency is 1.3 s.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 263-280 |
| 页数 | 18 |
| 期刊 | Information Sciences |
| 卷 | 453 |
| DOI | |
| 出版状态 | 已出版 - 7月 2018 |
| 已对外发布 | 是 |
指纹
探究 'FastPM: An approach to pattern matching via distributed stream processing' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver