UHRP: Uncertainty-Based Pruning Method for Anonymized Data Linear Regression

Kun Liu, Wenyan Liu, Junhong Cheng, Xingjian Lu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Anonymization method, as a kind of privacy protection technology for data publishing, has been heavily researched during the past twenty years. However, fewer researches have been conducted on making better use of the anonymized data for data mining. In this paper, we focus on training regression model using anonymized data and predicting on original samples using the trained model. Anonymized training instances are generally considered as hyper-rectangles, which is different from most machine learning tasks. We propose several hyper-rectangle vectorization methods that are compatible with both anonymized data and original data for model training. Anonymization brings additional uncertainty. To address this issue, we propose an Uncertainty-based Hyper-Rectangle Pruning method (UHRP) to reduce the disturbance introduced by anonymized data. In this method, we prune hyper-rectangle by its global uncertainty which is calculated from all uncertain attributes. Experiments show that a linear regressor trained on anonymized data could be expected to do as well as the model trained with original data under specific conditions. Experimental results also prove that our pruning method could further improve the model’s performance.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - DASFAA 2019 International Workshops
Subtitle of host publicationBDMS, BDQM, and GDMA, Proceedings
EditorsGuoliang Li, Juggapong Natwichai, Joao Gama, Yongxin Tong, Jun Yang
PublisherSpringer Verlag
Pages19-33
Number of pages15
ISBN (Print)9783030185893
DOIs
StatePublished - 2019
Externally publishedYes
Event24th International Conference on Database Systems for Advanced Applications, DASFAA 2019 - Chiang Mai, Thailand
Duration: 22 Apr 201925 Apr 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11448 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Database Systems for Advanced Applications, DASFAA 2019
Country/TerritoryThailand
CityChiang Mai
Period22/04/1925/04/19

Keywords

  • Anonymization
  • Interval value
  • Machine learning

Fingerprint

Dive into the research topics of 'UHRP: Uncertainty-Based Pruning Method for Anonymized Data Linear Regression'. Together they form a unique fingerprint.

Cite this