Hybrid schema summarization method of large scale database

Xue Wang, Xuan Zhou*, Shan Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The complexity of database schemas and the lack of documentations usually make databases difficult to use. Some existing solutions attempt to identify the most important tables based on the foreign key relationships and use these tables as a summary of the database schema. However, in real world scenarios, the schema summaries generated by these approaches may fail to capture the subjects of the databases. In this paper, we describe the limitations of the previous approaches, and propose a principled method to summarize large-scale database schemas. Firstly, we partition a database schema into communities through a number of community detection algorithms. Then, we integrate these results into a set of groups, each presenting a subject. Finally, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Our approach is evaluated on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely and the generated abstract schema layers are very helpful for users to explore a database.

Original languageEnglish
Pages (from-to)1616-1625
Number of pages10
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume36
Issue number8
DOIs
StatePublished - Aug 2013
Externally publishedYes

Keywords

  • Hybrid
  • Large-scale database
  • Schema
  • Subject group
  • Summarization

Fingerprint

Dive into the research topics of 'Hybrid schema summarization method of large scale database'. Together they form a unique fingerprint.

Cite this