TY - JOUR
T1 - 基于滑动窗口的分布式轨迹流聚类
AU - Mao, Jia Li
AU - Chen, He
AU - Song, Qiu Ge
AU - Jin, Che Qing
AU - Zhou, Ao Ying
N1 - Publisher Copyright:
© 2018, Science Press. All right reserved.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - With the widespread application of modern mobile devices and the vigorous development of location acquisition technologies, numerous moving objects have been relaying their locations continuously. Such the location information involves the geographic coordinates, speed, direction, as well as timestamps, etc. They are collected in real time and the amount of data continues to increase, and hence a tremendous amount of distributed trajectory data streams are generated. This necessitates to conduct the analysis task like clustering upon the distributed trajectory data streams timely and effectively to gain insights about the common moving trends of the moving objects. Due to the massive volume, high velocity, skewness distribution, time-varying evolution property of the streaming trajectories, the existing trajectory clustering techniques on the static trajectory data set cannot be directly applied to the distributed trajectory data streams. In addition, the key issue also comes from the strict space- and time-complexities of processing the continuously arrived trajectory data, combined with the concept drift that emerged in the streaming cases. Therefore, the study of clustering upon the distributed trajectory streams have faced the huge challenges, which makes this topic is still in its early stage of exploration. To address the above mentioned issues, it is imperative to develop the efficient parallel clustering analysis tasks, and exploit the effective communication mechanisms that minimize the transmission overhead to meet the real-time clustering requirement with low processing latency in processing the trajectory streams that collected in geographically dispersed manner. In this paper, we firstly present a novel distributed synopsis structure to extract the clustering characteristic of the trajectory cluster and further keep the track of the evolving distributed trajectory data streams. On the basis of that, with the aim of reducing the communication overhead and improving the clustering performance upon the distributed trajectory data streams, we develop an incremental algorithm for online clustering upon the distributed trajectory streams (called OCluDTS). OCluDTS algorithm leverages the sliding window model and consists of two phases. At the first phase (called as the local clustering phase), the most recent arrived sets of trajectories at each time instant in current window for all the remote site are conducted clustering in parallel to obtain the local clustering results. At the second phase (called as the local clustering results merging phase), the local clustering results of all the remote sites are transferred to the coordinator to participate in the re-clustering process to derive the final clustering results. This two-phase clustering mechanism could guarantee achieving the same high clustering precision as the centralized solution. Moreover, the pruning mechanism of similarity calculation, and the optimization strategy that testing first and transferring later enable OCluDTS algorithm to further boost the efficiency. Finally, we conduct extensive experiments on the real data set to evaluate the effectiveness and efficiency of OCluDTS through comparing it with the centralized algorithm. Theoretical analysis and experimental results show that OCluDTS can achieve the superior performance in clustering on the massive amounts of streaming trajectories.
AB - With the widespread application of modern mobile devices and the vigorous development of location acquisition technologies, numerous moving objects have been relaying their locations continuously. Such the location information involves the geographic coordinates, speed, direction, as well as timestamps, etc. They are collected in real time and the amount of data continues to increase, and hence a tremendous amount of distributed trajectory data streams are generated. This necessitates to conduct the analysis task like clustering upon the distributed trajectory data streams timely and effectively to gain insights about the common moving trends of the moving objects. Due to the massive volume, high velocity, skewness distribution, time-varying evolution property of the streaming trajectories, the existing trajectory clustering techniques on the static trajectory data set cannot be directly applied to the distributed trajectory data streams. In addition, the key issue also comes from the strict space- and time-complexities of processing the continuously arrived trajectory data, combined with the concept drift that emerged in the streaming cases. Therefore, the study of clustering upon the distributed trajectory streams have faced the huge challenges, which makes this topic is still in its early stage of exploration. To address the above mentioned issues, it is imperative to develop the efficient parallel clustering analysis tasks, and exploit the effective communication mechanisms that minimize the transmission overhead to meet the real-time clustering requirement with low processing latency in processing the trajectory streams that collected in geographically dispersed manner. In this paper, we firstly present a novel distributed synopsis structure to extract the clustering characteristic of the trajectory cluster and further keep the track of the evolving distributed trajectory data streams. On the basis of that, with the aim of reducing the communication overhead and improving the clustering performance upon the distributed trajectory data streams, we develop an incremental algorithm for online clustering upon the distributed trajectory streams (called OCluDTS). OCluDTS algorithm leverages the sliding window model and consists of two phases. At the first phase (called as the local clustering phase), the most recent arrived sets of trajectories at each time instant in current window for all the remote site are conducted clustering in parallel to obtain the local clustering results. At the second phase (called as the local clustering results merging phase), the local clustering results of all the remote sites are transferred to the coordinator to participate in the re-clustering process to derive the final clustering results. This two-phase clustering mechanism could guarantee achieving the same high clustering precision as the centralized solution. Moreover, the pruning mechanism of similarity calculation, and the optimization strategy that testing first and transferring later enable OCluDTS algorithm to further boost the efficiency. Finally, we conduct extensive experiments on the real data set to evaluate the effectiveness and efficiency of OCluDTS through comparing it with the centralized algorithm. Theoretical analysis and experimental results show that OCluDTS can achieve the superior performance in clustering on the massive amounts of streaming trajectories.
KW - Clustering
KW - Concept drift
KW - Distributed temporal trajectory cluster feature
KW - Distributed trajectory streams
KW - Sliding window model
UR - https://www.scopus.com/pages/publications/85055432690
U2 - 10.11897/SP.J.1016.2018.02120
DO - 10.11897/SP.J.1016.2018.02120
M3 - 文章
AN - SCOPUS:85055432690
SN - 0254-4164
VL - 41
SP - 2120
EP - 2133
JO - Jisuanji Xuebao/Chinese Journal of Computers
JF - Jisuanji Xuebao/Chinese Journal of Computers
IS - 9
ER -