TY - JOUR
T1 - Distributed top-k similarity query on big trajectory streams
AU - Zhang, Zhigang
AU - Qi, Xiaodong
AU - Wang, Yilin
AU - Jin, Cheqing
AU - Mao, Jiali
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2018, Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2019/6/1
Y1 - 2019/6/1
N2 - Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top-k similarity query, which finds k trajectories that are most similar to a given query trajectory from all remote sites, is critical in this field. The key challenge in such a query is how to reduce the communication cost due to the limited network bandwidth resource. Although this query can be solved by sending the query trajectory to all the remote sites, in which the pairwise similarities are computed precisely. However, the overall cost, O(n · m), is huge when n or m is huge, where n is the size of query trajectory and m is the number of remote sites. Fortunately, there are some cheap ways to estimate pairwise similarity, which filter some trajectories in advance without precise computation. In order to overcome the challenge in this query, we devise two general frameworks, into which concrete distance measures can be plugged. The former one uses two bounds (the upper and lower bound), while the latter one only uses the lower bound. Moreover, we introduce detailed implementations of two representative distance measures, Euclidean and DTW distance, after inferring the lower and upper bound for the former framework and the lower bound for the latter one. Theoretical analysis and extensive experiments on real-world datasets evaluate the efficiency of proposed methods.
AB - Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top-k similarity query, which finds k trajectories that are most similar to a given query trajectory from all remote sites, is critical in this field. The key challenge in such a query is how to reduce the communication cost due to the limited network bandwidth resource. Although this query can be solved by sending the query trajectory to all the remote sites, in which the pairwise similarities are computed precisely. However, the overall cost, O(n · m), is huge when n or m is huge, where n is the size of query trajectory and m is the number of remote sites. Fortunately, there are some cheap ways to estimate pairwise similarity, which filter some trajectories in advance without precise computation. In order to overcome the challenge in this query, we devise two general frameworks, into which concrete distance measures can be plugged. The former one uses two bounds (the upper and lower bound), while the latter one only uses the lower bound. Moreover, we introduce detailed implementations of two representative distance measures, Euclidean and DTW distance, after inferring the lower and upper bound for the former framework and the lower bound for the latter one. Theoretical analysis and extensive experiments on real-world datasets evaluate the efficiency of proposed methods.
KW - communication cost
KW - top-k similarity query
KW - trajectory stream
UR - https://www.scopus.com/pages/publications/85056094434
U2 - 10.1007/s11704-018-7234-6
DO - 10.1007/s11704-018-7234-6
M3 - 文章
AN - SCOPUS:85056094434
SN - 2095-2228
VL - 13
SP - 647
EP - 664
JO - Frontiers of Computer Science
JF - Frontiers of Computer Science
IS - 3
ER -