Clustering DTDs: An interactive two-level approach

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

XML (eXtensible Markup Language) is a standard which is widely applied in data representation and data exchange. However, as an important concept of XML, DTD (Document Type Definition) is not taken full advantage in current applications. In this paper, a new method for clustering DTDs is presented, and it can be used in XML document clustering. The two-level method clusters the elements in DTDs and clusters DTDs separately. Element clustering forms the first level and provides element clusters, which are the generalization of relevant elements. DTD clustering utilizes the generalized information and forms the second level in the whole clustering process. The two-level method has the following advantages: 1) It takes into consideration both the content and the structure within DTDs; 2) The generalized information about elements is more useful than the separated words in the vector model; 3) The two-level method facilitates the searching of outliers. The experiments show that this method is able to categorize the relevant DTDs effectively.

Original languageEnglish
Pages (from-to)807-819
Number of pages13
JournalJournal of Computer Science and Technology
Volume17
Issue number6
DOIs
StatePublished - Nov 2002
Externally publishedYes

Keywords

  • Clustering
  • DTD (Document Type Definition)
  • XML (eXtensible Markup Language)

Fingerprint

Dive into the research topics of 'Clustering DTDs: An interactive two-level approach'. Together they form a unique fingerprint.

Cite this