TY - GEN
T1 - Parallel subgraph listing in a large-scale graph
AU - Shao, Yingxia
AU - Cui, Bin
AU - Chen, Lei
AU - Ma, Lin
AU - Yao, Junjie
AU - Xu, Ning
PY - 2014
Y1 - 2014
N2 - Subgraph listing is a fundamental operation to many graph and network analyses. The problem itself is computationally expensive and is well-studied in centralized processing algorithms. However, the centralized solutions cannot scale well to large graphs. Recently, several parallel approaches are introduced to handle the large graphs. Unfortunately, these parallel approaches still rely on the expensive join operations, thus cannot achieve high performance. In this paper, we design a novel parallel subgraph listing framework, named PSgL. The PSgL iteratively enumerates subgraph instances and solves the subgraph listing in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, we propose several solutions to balance the workload and reduce the size of intermediate results. Specially, we prove the problem of partial subgraph instance distribution for workload balance is NP-hard, and carefully design a set of heuristic strategies. To further reduce the enormous intermediate results, we introduce three independent mechanisms, which are automorphism breaking of the pattern graph, initial pattern vertex selection based on a cost model, and a pruning method based on a light-weight index. We have implemented the prototype of PSgL, and run compre- hensive experiments of various graph listing operations on diverse large graphs. The experiments clearly demonstrate that PSgL is robust and can achieve performance gain over the state-of-the-art solutions up to 90%.
AB - Subgraph listing is a fundamental operation to many graph and network analyses. The problem itself is computationally expensive and is well-studied in centralized processing algorithms. However, the centralized solutions cannot scale well to large graphs. Recently, several parallel approaches are introduced to handle the large graphs. Unfortunately, these parallel approaches still rely on the expensive join operations, thus cannot achieve high performance. In this paper, we design a novel parallel subgraph listing framework, named PSgL. The PSgL iteratively enumerates subgraph instances and solves the subgraph listing in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, we propose several solutions to balance the workload and reduce the size of intermediate results. Specially, we prove the problem of partial subgraph instance distribution for workload balance is NP-hard, and carefully design a set of heuristic strategies. To further reduce the enormous intermediate results, we introduce three independent mechanisms, which are automorphism breaking of the pattern graph, initial pattern vertex selection based on a cost model, and a pruning method based on a light-weight index. We have implemented the prototype of PSgL, and run compre- hensive experiments of various graph listing operations on diverse large graphs. The experiments clearly demonstrate that PSgL is robust and can achieve performance gain over the state-of-the-art solutions up to 90%.
KW - Graph
KW - Graph algorithm
KW - Graph enumeration
KW - Subgraph listing
UR - https://www.scopus.com/pages/publications/84904323776
U2 - 10.1145/2588555.2588557
DO - 10.1145/2588555.2588557
M3 - 会议稿件
AN - SCOPUS:84904323776
SN - 9781450323765
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 625
EP - 636
BT - SIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
Y2 - 22 June 2014 through 27 June 2014
ER -