Robust model-free feature screening based on modified Hoeffding measure for ultra-high dimensional data

  • Yuan Yu
  • , Di He
  • , Yong Zhou*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Sure independence screening (SIS) has become a cuttingedge dimension reduction technique to extract important features from ultrahigh-dimensional data in statistical learning. Many of the screening methods are developed to be suitable for special models that follow certain assumptions. With the availability of more data types and complicated models, a robust model-free procedure with less restrictive conditions of data is required. In this paper, we propose a modified Hoeffding measure which efficiently characterize the dependence between two random variables. The modified Hoeffding measure is between 0 and 1, and zero if and only if the two variables are independent under some mild conditions. This property enables us to propose a novel feature screening procedure based on it without specifying the regression structure. The proposed method is robust for both the predictors and response with the heavytailed data and outliers, and suitable for complex data including discrete and multivariate variables. In addition, it can extract important features even when the underlying model is complicated. We further establish the sure screening property and ranking consistency property even when the dimensionality is an exponential order of the sample size without assuming any moment condition on the predictors and response. Simulations and an analysis of real data demonstrate the versatility and practicability of the proposed method in comparison with other state-of-the-art approaches.

Original languageEnglish
Pages (from-to)473-489
Number of pages17
JournalStatistics and its Interface
Volume11
Issue number3
DOIs
StatePublished - 2018

Keywords

  • Feature screening
  • Hoeffding measure
  • Ranking consistency property
  • Robustness
  • Sure screening property
  • Ultrahigh-dimensional data

Fingerprint

Dive into the research topics of 'Robust model-free feature screening based on modified Hoeffding measure for ultra-high dimensional data'. Together they form a unique fingerprint.

Cite this