Incremental Mining of the Schema of Semistructured Data

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Semistructured data are specified in lack of any fixed and rigid schema, even though typically some implicit structure appears in the data. The huge amounts of on-line applications make it important and imperative to mine the schema of semistructured data, both for the users (e.g., to gather useful information and facilitate querying) and for the systems (e.g., to optimize access). The critical problem is to discover the hidden structure in the semistructured data. Current methods in extracting Web data structure are either in a general way independent of application background, or bound in some concrete environment such as HTML, XML etc. But both face the burden of expensive cost and difficulty in keeping along with the frequent and complicated variances of Web data. In this paper, the problem of incremental mining of schema for semistructured data after the update of the raw data is discussed. An algorithm for incrementally mining the schema of semistructured data is provided, and some experimental results are also given, which show that incremental mining for semistructured data is more efficient than non-incremental mining.

Original languageEnglish
Pages (from-to)241-248
Number of pages8
JournalJournal of Computer Science and Technology
Volume15
Issue number3
DOIs
StatePublished - May 2000
Externally publishedYes

Keywords

  • Algorithm
  • Data mining
  • Incremental mining
  • Schema
  • Semistructured data

Fingerprint

Dive into the research topics of 'Incremental Mining of the Schema of Semistructured Data'. Together they form a unique fingerprint.

Cite this