Summarizing large-scale database schema using community detection

  • Xue Wang*
  • , Xuan Zhou
  • , Shan Wang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus dificult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.

Original languageEnglish
Pages (from-to)515-526
Number of pages12
JournalJournal of Computer Science and Technology
Volume27
Issue number3
DOIs
StatePublished - Jan 2012
Externally publishedYes

Keywords

  • Community detection
  • Large scale
  • Schema
  • Summarization

Fingerprint

Dive into the research topics of 'Summarizing large-scale database schema using community detection'. Together they form a unique fingerprint.

Cite this