Divideup: A Generic Improvement Approach for Supervised Learning Using Dataset Partition with Finer Semantical Information

Xinyue Zhang, Qin Li*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Supervised learning technologies represented by neural networks have made great progress in many fields. In particular applications such as image recognition and natural language processing, the entire computing process is completely handed over to the machine learning algorithm to directly learn the mapping from the feature space to the expected output, without considering much of the semantical information and domain knowledge of the data. In this paper, we propose a generic data refinement approach called divideup, which incorporates finer semantical information into the dataset to obtain a prediction model capturing more detailed information in the training data. By providing the information theory, we have high confidence that the learned model trained with the refined dataset has better prediction accuracy than the original one. We conduct extensive experiments on different datasets with the state-of-the-art neural network architectures such as ResNet and DenseNet. The experimental results show that divideup improves the prediction accuracy of all these deep learning architectures on the original test set. The divideup approach is also applied to other machine learning models such as random forest, XGboost and SVM. The results supports the conclusion that the refined training data obtained by divideup produces better prediction accuracy of the learned model.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages635-641
Number of pages7
ISBN (Electronic)9781728185262
DOIs
StatePublished - 11 Oct 2020
Event2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020 - Toronto, Canada
Duration: 11 Oct 202014 Oct 2020

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
Volume2020-October
ISSN (Print)1062-922X

Conference

Conference2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020
Country/TerritoryCanada
CityToronto
Period11/10/2014/10/20

Fingerprint

Dive into the research topics of 'Divideup: A Generic Improvement Approach for Supervised Learning Using Dataset Partition with Finer Semantical Information'. Together they form a unique fingerprint.

Cite this