TY - GEN
T1 - Estimate unlabeled-data-distribution for semi-supervised PU learning
AU - Hu, Haoji
AU - Sha, Chaofeng
AU - Wang, Xiaoling
AU - Zhou, Aoying
PY - 2012
Y1 - 2012
N2 - Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent years and the semi-supervised learning is an efficient solution to learn from positive and unlabeled examples(or PU learning). Among all the semi-supervised PU learning methods, it's hard to choose just one approach to fit all unlabeled data distribution. This paper proposes an automatic KL-divergence based semi-supervised learning method by using unlabeled data distribution knowledge. Meanwhile, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of the former methods. The experimental results show that (1)data distribution information is very helpful for the semi-supervised PU learning method; (2)the proposed framework can achieve higher precision when compared with the-state-of-the-art method.
AB - Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent years and the semi-supervised learning is an efficient solution to learn from positive and unlabeled examples(or PU learning). Among all the semi-supervised PU learning methods, it's hard to choose just one approach to fit all unlabeled data distribution. This paper proposes an automatic KL-divergence based semi-supervised learning method by using unlabeled data distribution knowledge. Meanwhile, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of the former methods. The experimental results show that (1)data distribution information is very helpful for the semi-supervised PU learning method; (2)the proposed framework can achieve higher precision when compared with the-state-of-the-art method.
UR - https://www.scopus.com/pages/publications/84859722219
U2 - 10.1007/978-3-642-29253-8_3
DO - 10.1007/978-3-642-29253-8_3
M3 - 会议稿件
AN - SCOPUS:84859722219
SN - 9783642292521
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 22
EP - 33
BT - Web Technologies and Applications - 14th Asia-Pacific Web Conference, APWeb 2012, Proceedings
T2 - 14th Asia Pacific Web Technology Conference, APWeb 2012
Y2 - 11 April 2012 through 13 April 2012
ER -