Feature screening for ultrahigh dimensional categorical data with covariates missing at random

Lyu Ni, Fang Fang, Jun Shao

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Most existing feature screening methods assume that data are fully observed. It is quite a challenge to develop screening methods for incomplete data since the traditional missing data analysis techniques cannot be directly applied to ultrahigh dimensional case. A two-step model-free feature screening procedure for ultrahigh dimensional categorical data when some covariate values are missing at random is developed. For each covariate with missing data, the first step screens out the variables in the unspecified propensity function. In the second step, screening statistics such as the adjusted Pearson Chi-Square statistics can be calculated by leveraging the variables obtained in the first step and the special structure of categorical data. Sure screening properties are established for the proposed method. Finite sample performance is investigated by simulation studies and a real data example.

Original languageEnglish
Article number106824
JournalComputational Statistics and Data Analysis
Volume142
DOIs
StatePublished - Feb 2020

Keywords

  • Feature screening
  • Missing at random
  • Missing covariate
  • Pearson Chi-Square statistic
  • Sure screening property

Fingerprint

Dive into the research topics of 'Feature screening for ultrahigh dimensional categorical data with covariates missing at random'. Together they form a unique fingerprint.

Cite this