Abstract
The design of the infrastructure for Chinese Web (CWI), a prototype system aimed at forum data analysis, is introduced. CWI takes a best effort approach. 1) It tries its best to extract or annotate semantics over the web data. 2) It provides flexible schemes for users to transform the web data into eXtensible Markup Language (XML) forms with more semantic annotations that are more friendly for further analytical tasks. 3) A distributed graph repository, called DISGR is used as backend for management of web data. The paper introduces the design issues, reports the progress of the implementation, and discusses the research issues that are under study.
| Original language | English |
|---|---|
| Pages (from-to) | 388-396 |
| Number of pages | 9 |
| Journal | Frontiers of Electrical and Electronic Engineering in China |
| Volume | 6 |
| Issue number | 2 |
| DOIs | |
| State | Published - Jun 2011 |
Keywords
- Chinese Web infrastructure
- distributed storage
- graph data model
- semantic entity