TY - GEN
T1 - Top-k spatio-textual similarity join
AU - Hu, Huiqi
AU - Li, Guoliang
AU - Bao, Zhifeng
AU - Feng, Jianhua
AU - Wu, Yongwei
AU - Gong, Zhiguo
AU - Xu, Yaoqiang
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/6/22
Y1 - 2016/6/22
N2 - With the rapid development of mobile Internet technology, Internet users are shifting from desktop to mobile devices. Modern mobile devices (e.g., smartphones and tablets) are equipped with GPS, which can help users to easily obtain their locations, and location-based services (LBS) have been widely deployed. LBS users are generating more and more spatio-textual data which contains both textual descriptions and geographical locations. In user-generated data, a spatiotextual entity may have different representations, possibly due to GPS deviations or typographical errors [6], [2], and it calls for effective methods to integrate the spatio-textual data from different data sources. A spatio-textual similarity join is an important operation in spatio-textual data integration, which, given two sets of spatio-textual objects, finds all similar pairs from the two sets, where the similarity can be quantified by combining spatial proximity and textual relevancy. There are many applications in spatio-textual similarity joins, e.g., user recommendation in location-based social networks, image duplication detection using spatio-textual tags, spatio-textual advertising, and location-based market analysis [6], [2]. For example, a house rental agency (e.g., rent.com) wants to perform a similarity join on the spatio-textual data of house requirements from renters and the data of house properties from owners. For another example, a startup company, e.g., Factual (factual.com), crawls spatio-textual records to generate points of interest (POIs). As the records are from multiple sources and may contain many duplicates, It needs to run similarity joins to remove the duplicates.
AB - With the rapid development of mobile Internet technology, Internet users are shifting from desktop to mobile devices. Modern mobile devices (e.g., smartphones and tablets) are equipped with GPS, which can help users to easily obtain their locations, and location-based services (LBS) have been widely deployed. LBS users are generating more and more spatio-textual data which contains both textual descriptions and geographical locations. In user-generated data, a spatiotextual entity may have different representations, possibly due to GPS deviations or typographical errors [6], [2], and it calls for effective methods to integrate the spatio-textual data from different data sources. A spatio-textual similarity join is an important operation in spatio-textual data integration, which, given two sets of spatio-textual objects, finds all similar pairs from the two sets, where the similarity can be quantified by combining spatial proximity and textual relevancy. There are many applications in spatio-textual similarity joins, e.g., user recommendation in location-based social networks, image duplication detection using spatio-textual tags, spatio-textual advertising, and location-based market analysis [6], [2]. For example, a house rental agency (e.g., rent.com) wants to perform a similarity join on the spatio-textual data of house requirements from renters and the data of house properties from owners. For another example, a startup company, e.g., Factual (factual.com), crawls spatio-textual records to generate points of interest (POIs). As the records are from multiple sources and may contain many duplicates, It needs to run similarity joins to remove the duplicates.
UR - https://www.scopus.com/pages/publications/84980416753
U2 - 10.1109/ICDE.2016.7498433
DO - 10.1109/ICDE.2016.7498433
M3 - 会议稿件
AN - SCOPUS:84980416753
T3 - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
SP - 1576
EP - 1577
BT - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Conference on Data Engineering, ICDE 2016
Y2 - 16 May 2016 through 20 May 2016
ER -