A spectral method to separate disconnected and nearly-disconnected Web graph components

  • Chris H.Q. Ding*
  • , Xiaofeng He
  • , Hongyuan Zha
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

44 Scopus citations

Abstract

Separation of connected components from a graph with disconnected graph components mostly use breadth-first search (BFS) or depth-first search (DFS) graph algorithms. Here we propose a new algebraic method to separate disconnected and nearly-disconnected components. This method is based on spectral graph partitioning, following a key observation that disconnected components will show up, after properly sorted, as step-function like curve in the lowest eigenvectors of the Laplacian matrix of the graph. Following an perturbative analysis framework, we systematically analyzed the graph structures, first on the disconnected subgraph ease, and second on the effects of adding edges sparsely connecting different subgraphs as a perturbation. Several new results are derived, providing insights to spectral methods and related clustering objective function. Examples are given illustrating the concepts and results our methods. Comparing to the standard graph algorithms, this method has the same O(|E| + |V|log(|V|)) complexity, but is easier to implement (using readily available eigensolvers). Further more the method can easily identify articulation points and bridges on nearly-disconnected graphs. Segmentation of a real example of Web graph for query amazon is given. We found that each disconnected or nearly-disconnected components forms a cluster on a clear topic.

Original languageEnglish
Title of host publicationProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsF. Provost, R. Srikant, M. Schkolnick, D. Lee
PublisherAssociation for Computing Machinery (ACM)
Pages275-280
Number of pages6
ISBN (Print)158113391X, 9781581133912
DOIs
StatePublished - 2001
Externally publishedYes
EventProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) - San Francisco, CA, United States
Duration: 26 Aug 200129 Aug 2001

Publication series

NameProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

ConferenceProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)
Country/TerritoryUnited States
CitySan Francisco, CA
Period26/08/0129/08/01

Fingerprint

Dive into the research topics of 'A spectral method to separate disconnected and nearly-disconnected Web graph components'. Together they form a unique fingerprint.

Cite this