Automatic extraction rules generation based on XPath pattern learning

Jingwei Zhang, Can Zhang, Weining Qian, Aoying Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Web forums have become important information sources on the Web due to their rich content contributed by millions of Internet users every day. Data extraction from Web pages is a key but cumbersome step for data analysis because of significant human intervention. Web forums have fairly regular structures which allow us to generate extraction rules automatically according to their paths. In this paper, we introduce formal expressions for XPath patterns and pattern mapping rules, and advise machine learning methods to generate extraction rules for automatic data extraction from Web forums. The experimental results on real-life Web forums show good feasibility and accuracy for forum data.

Original languageEnglish
Title of host publicationWeb Information Systems Engineering - WISE 2010 Workshops - WISE 2010 International Symposium WISS and International Workshops CISE, MBC, Revised Selected Papers
Pages58-69
Number of pages12
DOIs
StatePublished - 2011
EventWorkshops on Web Information Systems Engineering, WISE 2010: 1st International Symposium on Web Intelligent Systems and Services, WISS 2010, 2nd International Workshop on Mobile Business Collaboration, MBC 2010 and 1st Int. Workshop on CISE 2010 - Hong Kong, China
Duration: 12 Dec 201014 Dec 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6724 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceWorkshops on Web Information Systems Engineering, WISE 2010: 1st International Symposium on Web Intelligent Systems and Services, WISS 2010, 2nd International Workshop on Mobile Business Collaboration, MBC 2010 and 1st Int. Workshop on CISE 2010
Country/TerritoryChina
CityHong Kong
Period12/12/1014/12/10

Keywords

  • Web forum
  • data extraction
  • mapping rule

Fingerprint

Dive into the research topics of 'Automatic extraction rules generation based on XPath pattern learning'. Together they form a unique fingerprint.

Cite this