Solving the missing at random problem in semi-supervised learning: An inverse probability weighting method

Research output: Contribution to journalArticlepeer-review

Abstract

We propose an estimator for the population mean (Formula presented.) under the semi-supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing (Formula presented.), denoted by (Formula presented.), depends on the total sample size (Formula presented.) and satisfies (Formula presented.). To efficiently estimate (Formula presented.), we introduce an adaptive estimator based on inverse probability weighting and cross-fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of (Formula presented.), slower than the typical (Formula presented.) rate, due to the diminishing proportion of labelled data as the sample size (Formula presented.) increases in the semi-supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.

Original languageEnglish
Article numbere707
JournalStat
Volume13
Issue number3
DOIs
StatePublished - Sep 2024

Keywords

  • dimension reduction
  • inverse probability weighting
  • mean estimation
  • missing at random
  • semi-supervised learning

Fingerprint

Dive into the research topics of 'Solving the missing at random problem in semi-supervised learning: An inverse probability weighting method'. Together they form a unique fingerprint.

Cite this