Abstract
Data quality, especially data cleaning, is surveyed. The importance of data quality, and its measurement metrics are described. The data cleaning problems are defined and classified. The approaches to solving data quality problems are detailed. How to combine the techniques in other research areas with data cleaning is overviewed, and several data cleaning frameworks proposed previously by others are introduced. The future research topics related to data cleaning problems are also discussed.
| Original language | English |
|---|---|
| Pages (from-to) | 2076-2082 |
| Number of pages | 7 |
| Journal | Ruan Jian Xue Bao/Journal of Software |
| Volume | 13 |
| Issue number | 11 |
| State | Published - Nov 2002 |
| Externally published | Yes |
Keywords
- Data cleaning
- Data cleaning framework
- Data integration
- Data quality
- Duplicate record