TY - JOUR
T1 - Adjusted Pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data
AU - Ni, Lyu
AU - Fang, Fang
AU - Wan, Fangjiao
N1 - Publisher Copyright:
© 2017, Springer-Verlag GmbH Germany.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - Huang et al. (J Bus Econ Stat 32:237–244, 2014) first proposed a Pearson Chi-Square based feature screening procedure tailored to multi-classification problem with ultrahigh dimensional categorical covariates, which is a common problem in practice but has seldom been discussed in the literature. However, their work establishes the sure screening property only in a limited setting. Moreover, the p value based adjustments when the number of categories involved by each covariate is different do not work well in several practical situations. In this paper, we propose an adjusted Pearson Chi-Square feature screening procedure and a modified method for tuning parameter selection. Theoretically, we establish the sure screening property of the proposed method in general settings. Empirically, the proposed method is more successful than Pearson Chi-Square feature screening in handling non-equal numbers of covariate categories in finite samples. Results of three simulation studies and one real data analysis are presented. Our work together with Huang et al. (J Bus Econ Stat 32:237–244, 2014) establishes a solid theoretical foundation and empirical evidence for the family of Pearson Chi-Square based feature screening methods.
AB - Huang et al. (J Bus Econ Stat 32:237–244, 2014) first proposed a Pearson Chi-Square based feature screening procedure tailored to multi-classification problem with ultrahigh dimensional categorical covariates, which is a common problem in practice but has seldom been discussed in the literature. However, their work establishes the sure screening property only in a limited setting. Moreover, the p value based adjustments when the number of categories involved by each covariate is different do not work well in several practical situations. In this paper, we propose an adjusted Pearson Chi-Square feature screening procedure and a modified method for tuning parameter selection. Theoretically, we establish the sure screening property of the proposed method in general settings. Empirically, the proposed method is more successful than Pearson Chi-Square feature screening in handling non-equal numbers of covariate categories in finite samples. Results of three simulation studies and one real data analysis are presented. Our work together with Huang et al. (J Bus Econ Stat 32:237–244, 2014) establishes a solid theoretical foundation and empirical evidence for the family of Pearson Chi-Square based feature screening methods.
KW - Continuous and categorical covariates
KW - Diverging classes
KW - Pearson Chi-Square statistics
KW - Sure screening property
UR - https://www.scopus.com/pages/publications/85030661886
U2 - 10.1007/s00184-017-0629-9
DO - 10.1007/s00184-017-0629-9
M3 - 文章
AN - SCOPUS:85030661886
SN - 0026-1335
VL - 80
SP - 805
EP - 828
JO - Metrika
JF - Metrika
IS - 6-8
ER -