跳到主要导航 跳到搜索 跳到主要内容

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

  • Zhangchi Zhu
  • , Lu Wang
  • , Pu Zhao
  • , Chao Du
  • , Wei Zhang*
  • , Hang Dong
  • , Bo Qiao
  • , Qingwei Lin*
  • , Saravan Rajmohan
  • , Dongmei Zhang
  • *此作品的通讯作者
  • East China Normal University
  • Microsoft USA

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel ''hardness'' measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more ''easy'' samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU.

源语言英语
主期刊名KDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
出版商Association for Computing Machinery
3663-3673
页数11
ISBN(电子版)9798400701030
DOI
出版状态已出版 - 4 8月 2023
活动29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023 - Long Beach, 美国
期限: 6 8月 202310 8月 2023

出版系列

姓名Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN(印刷版)2154-817X

会议

会议29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023
国家/地区美国
Long Beach
时期6/08/2310/08/23

指纹

探究 'Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction' 的科研主题。它们共同构成独一无二的指纹。

引用此