TY - GEN
T1 - Graph Summarization via Node Grouping
T2 - 16th ACM International Conference on Web Search and Data Mining, WSDM 2023
AU - Merchant, Arpit
AU - Mathioudakis, Michael
AU - Wang, Yanhao
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/2/27
Y1 - 2023/2/27
N2 - Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.
AB - Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.
KW - clustering
KW - graph summarization
KW - spectral algorithms
UR - https://www.scopus.com/pages/publications/85149656670
U2 - 10.1145/3539597.3570441
DO - 10.1145/3539597.3570441
M3 - 会议稿件
AN - SCOPUS:85149656670
T3 - WSDM 2023 - Proceedings of the 16th ACM International Conference on Web Search and Data Mining
SP - 742
EP - 750
BT - WSDM 2023 - Proceedings of the 16th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
Y2 - 27 February 2023 through 3 March 2023
ER -