TY - GEN
T1 - On disambiguating authors
T2 - 37th IEEE International Conference on Data Engineering, ICDE 2021
AU - Li, Na
AU - Zhu, Renyu
AU - Zhou, Xiaoxu
AU - He, Xiangnan
AU - Cai, Wenyuan
AU - Gao, Ming
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - Author disambiguation arises when different authors share the same name, which is a critical task in digital libraries, such as DBLP, CiteULike, CiteSeerX, etc. While the state-of-the-art methods have developed various paper embedding-based methods performing in a top-down manner, they primarily focus on the ego-network of a target name and overlook the low-quality collaborative relations existed in the ego-network. Thus, these methods can be suboptimal for disambiguating authors.In this paper, we model the author disambiguation as a collaboration network reconstruction problem, and propose an incremental and unsupervised author disambiguation method, namely IUAD, which performs in a bottom-up manner. Initially, we build a stable collaboration network based on stable collaborative relations. To further improve the recall, we build a probabilistic generative model to reconstruct the complete collaboration network. In addition, for newly published papers, we can incrementally judge who publish them via only computing the posterior probabilities. We have conducted extensive experiments on a large-scale DBLP dataset to evaluate IUAD. The experimental results demonstrate that IUAD not only achieves the promising performance, but also outperforms comparable baselines significantly. Codes are available at https://github.com/papergitgit/IUAD.
AB - Author disambiguation arises when different authors share the same name, which is a critical task in digital libraries, such as DBLP, CiteULike, CiteSeerX, etc. While the state-of-the-art methods have developed various paper embedding-based methods performing in a top-down manner, they primarily focus on the ego-network of a target name and overlook the low-quality collaborative relations existed in the ego-network. Thus, these methods can be suboptimal for disambiguating authors.In this paper, we model the author disambiguation as a collaboration network reconstruction problem, and propose an incremental and unsupervised author disambiguation method, namely IUAD, which performs in a bottom-up manner. Initially, we build a stable collaboration network based on stable collaborative relations. To further improve the recall, we build a probabilistic generative model to reconstruct the complete collaboration network. In addition, for newly published papers, we can incrementally judge who publish them via only computing the posterior probabilities. We have conducted extensive experiments on a large-scale DBLP dataset to evaluate IUAD. The experimental results demonstrate that IUAD not only achieves the promising performance, but also outperforms comparable baselines significantly. Codes are available at https://github.com/papergitgit/IUAD.
KW - Author Disambiguation
KW - Collaboration Network
KW - Exponential Family
KW - Probabilistic Generative Model
UR - https://www.scopus.com/pages/publications/85112865564
U2 - 10.1109/ICDE51399.2021.00082
DO - 10.1109/ICDE51399.2021.00082
M3 - 会议稿件
AN - SCOPUS:85112865564
T3 - Proceedings - International Conference on Data Engineering
SP - 888
EP - 899
BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PB - IEEE Computer Society
Y2 - 19 April 2021 through 22 April 2021
ER -