TY - JOUR
T1 - SCHAIN-IRAM
T2 - An Efficient and Effective Semi-Supervised Clustering Algorithm for Attributed Heterogeneous Information Networks
AU - Li, Xiang
AU - Wu, Yao
AU - Ester, Martin
AU - Kao, Ben
AU - Wang, Xin
AU - Zheng, Yudian
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2022/4/1
Y1 - 2022/4/1
N2 - A heterogeneous information network (HIN) is one whose nodes model objects of different types and whose links model objects' relationships. To enrich its information, objects in an HIN are typically associated with additional attributes. We call such an HIN an Attributed HIN or AHIN. We study the problem of clustering objects in an AHIN, taking into account objects' similarities with respect to both object attribute values and their structural connectedness in the network. We show how supervision signal, expressed in the form of a must-link set and a cannot-link set, can be leveraged to improve clustering results. We put forward the SCHAIN algorithm to solve the clustering problem, and two highly efficient variants, SCHAIN-PI and SCHAIN-IRAM, which employ the power iteration based method and the implicitly restarted Arnoldi method respectively to compute eigenvectors of a matrix. We conduct extensive experiments comparing SCHAIN-based algorithms with other state-of-the-art clustering algorithms. Our results show that SCHAIN-IRAM outperforms other competitors in terms of clustering effectiveness and is highly efficient.
AB - A heterogeneous information network (HIN) is one whose nodes model objects of different types and whose links model objects' relationships. To enrich its information, objects in an HIN are typically associated with additional attributes. We call such an HIN an Attributed HIN or AHIN. We study the problem of clustering objects in an AHIN, taking into account objects' similarities with respect to both object attribute values and their structural connectedness in the network. We show how supervision signal, expressed in the form of a must-link set and a cannot-link set, can be leveraged to improve clustering results. We put forward the SCHAIN algorithm to solve the clustering problem, and two highly efficient variants, SCHAIN-PI and SCHAIN-IRAM, which employ the power iteration based method and the implicitly restarted Arnoldi method respectively to compute eigenvectors of a matrix. We conduct extensive experiments comparing SCHAIN-based algorithms with other state-of-the-art clustering algorithms. Our results show that SCHAIN-IRAM outperforms other competitors in terms of clustering effectiveness and is highly efficient.
KW - Semi-supervised clustering
KW - attributed heterogeneous information network
KW - network structure
KW - object attributes
UR - https://www.scopus.com/pages/publications/85126560326
U2 - 10.1109/TKDE.2020.2997938
DO - 10.1109/TKDE.2020.2997938
M3 - 文章
AN - SCOPUS:85126560326
SN - 1041-4347
VL - 34
SP - 1980
EP - 1992
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
ER -