TY - GEN
T1 - Filtering duplicate items over distributed data streams
AU - Xia, Tian
AU - Jin, Cheqing
AU - Zhou, Xiaofang
AU - Zhou, Aoying
PY - 2005
Y1 - 2005
N2 - In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
AB - In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
UR - https://www.scopus.com/pages/publications/33646529080
U2 - 10.1007/11563952_80
DO - 10.1007/11563952_80
M3 - 会议稿件
AN - SCOPUS:33646529080
SN - 3540292276
SN - 9783540292272
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 779
EP - 784
BT - Advances in Web-Age Information Management - 6th International Conference, WAIM 2005, Proceedings
PB - Springer Verlag
T2 - 6th International Conference on Advances in Web-Age Information Management, WAIM 2005
Y2 - 11 October 2005 through 13 October 2005
ER -