Abstract
The complexity of database schemas and the lack of documentations usually make databases difficult to use. Some existing solutions attempt to identify the most important tables based on the foreign key relationships and use these tables as a summary of the database schema. However, in real world scenarios, the schema summaries generated by these approaches may fail to capture the subjects of the databases. In this paper, we describe the limitations of the previous approaches, and propose a principled method to summarize large-scale database schemas. Firstly, we partition a database schema into communities through a number of community detection algorithms. Then, we integrate these results into a set of groups, each presenting a subject. Finally, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Our approach is evaluated on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely and the generated abstract schema layers are very helpful for users to explore a database.
| Original language | English |
|---|---|
| Pages (from-to) | 1616-1625 |
| Number of pages | 10 |
| Journal | Jisuanji Xuebao/Chinese Journal of Computers |
| Volume | 36 |
| Issue number | 8 |
| DOIs | |
| State | Published - Aug 2013 |
| Externally published | Yes |
Keywords
- Hybrid
- Large-scale database
- Schema
- Subject group
- Summarization