Scaling up graph-based semisupervised learning via prototype vector machines

Kai Zhang, Liang Lan, James T. Kwok, Slobodan Vucetic, Bahram Parvin

Research output: Contribution to journalArticlepeer-review

41 Scopus citations

Abstract

When the amount of labeled data are limited, semisupervised learning can improve the learner's performance by also using the often easily available unlabeled data. In particular, a popular approach requires the learned function to be smooth on the underlying data manifold. By approximating this manifold as a weighted graph, such graph-based techniques can often achieve state-of-the-art performance. However, their high time and space complexities make them less attractive on large data sets. In this paper, we propose to scale up graph-based semisupervised learning using a set of sparse prototypes derived from the data. These prototypes serve as a small set of data representatives, which can be used to approximate the graph-based regularizer and to control model complexity. Consequently, both training and testing become much more efficient. Moreover, when the Gaussian kernel is used to define the graph affinity, a simple and principled method to select the prototypes can be obtained. Experiments on a number of real-world data sets demonstrate encouraging performance and scaling properties of the proposed approach. It also compares favorably with models learned via ℓ1-regularization at the same level of model sparsity. These results demonstrate the efficacy of the proposed approach in producing highly parsimonious and accurate models for semisupervised learning.

Original languageEnglish
Article number6803073
Pages (from-to)444-457
Number of pages14
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume26
Issue number3
DOIs
StatePublished - 1 Mar 2015
Externally publishedYes

Keywords

  • Graph-based methods
  • large data sets
  • low-rank approximation
  • manifold regularization
  • semisupervised learning.

Fingerprint

Dive into the research topics of 'Scaling up graph-based semisupervised learning via prototype vector machines'. Together they form a unique fingerprint.

Cite this