TY - JOUR
T1 - Scaling up graph-based semisupervised learning via prototype vector machines
AU - Zhang, Kai
AU - Lan, Liang
AU - Kwok, James T.
AU - Vucetic, Slobodan
AU - Parvin, Bahram
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2015/3/1
Y1 - 2015/3/1
N2 - When the amount of labeled data are limited, semisupervised learning can improve the learner's performance by also using the often easily available unlabeled data. In particular, a popular approach requires the learned function to be smooth on the underlying data manifold. By approximating this manifold as a weighted graph, such graph-based techniques can often achieve state-of-the-art performance. However, their high time and space complexities make them less attractive on large data sets. In this paper, we propose to scale up graph-based semisupervised learning using a set of sparse prototypes derived from the data. These prototypes serve as a small set of data representatives, which can be used to approximate the graph-based regularizer and to control model complexity. Consequently, both training and testing become much more efficient. Moreover, when the Gaussian kernel is used to define the graph affinity, a simple and principled method to select the prototypes can be obtained. Experiments on a number of real-world data sets demonstrate encouraging performance and scaling properties of the proposed approach. It also compares favorably with models learned via ℓ1-regularization at the same level of model sparsity. These results demonstrate the efficacy of the proposed approach in producing highly parsimonious and accurate models for semisupervised learning.
AB - When the amount of labeled data are limited, semisupervised learning can improve the learner's performance by also using the often easily available unlabeled data. In particular, a popular approach requires the learned function to be smooth on the underlying data manifold. By approximating this manifold as a weighted graph, such graph-based techniques can often achieve state-of-the-art performance. However, their high time and space complexities make them less attractive on large data sets. In this paper, we propose to scale up graph-based semisupervised learning using a set of sparse prototypes derived from the data. These prototypes serve as a small set of data representatives, which can be used to approximate the graph-based regularizer and to control model complexity. Consequently, both training and testing become much more efficient. Moreover, when the Gaussian kernel is used to define the graph affinity, a simple and principled method to select the prototypes can be obtained. Experiments on a number of real-world data sets demonstrate encouraging performance and scaling properties of the proposed approach. It also compares favorably with models learned via ℓ1-regularization at the same level of model sparsity. These results demonstrate the efficacy of the proposed approach in producing highly parsimonious and accurate models for semisupervised learning.
KW - Graph-based methods
KW - large data sets
KW - low-rank approximation
KW - manifold regularization
KW - semisupervised learning.
UR - https://www.scopus.com/pages/publications/84923874457
U2 - 10.1109/TNNLS.2014.2315526
DO - 10.1109/TNNLS.2014.2315526
M3 - 文章
C2 - 25720002
AN - SCOPUS:84923874457
SN - 2162-237X
VL - 26
SP - 444
EP - 457
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 3
M1 - 6803073
ER -