TY - JOUR
T1 - Semiparametric Inference in a Genetic Mixture Model
AU - Li, Pengfei
AU - Liu, Yukun
AU - Qin, Jing
N1 - Publisher Copyright:
© 2017 American Statistical Association.
PY - 2017/7/3
Y1 - 2017/7/3
N2 - In genetic backcross studies, data are often collected from complex mixtures of distributions with known mixing proportions. Previous approaches to the inference of these genetic mixture models involve parameterizing the component distributions. However, model misspecification of any form is expected to have detrimental effects. We propose a semiparametric likelihood method for genetic mixture models: the empirical likelihood under the exponential tilting model assumption, in which the log ratio of the probability (density) functions from the components is linear in the observations. An application to mice cancer genetics involves random numbers of offspring within a litter. In other words, the cluster size is a random variable. We wish to test the null hypothesis that there is no difference between the two components in the mixture model, but unfortunately we find that the Fisher information is degenerate. As a consequence, the conventional two-term expansion in the likelihood ratio statistic does not work. By using a higher-order expansion, we are able to establish a nonstandard convergence rate N− 1/4 for the odds ratio parameter estimator (Formula presented.). Moreover, the limiting distribution of the empirical likelihood ratio statistic is derived. The underlying distribution function of each component can also be estimated semiparametrically. Analogously to the full parametric approach, we develop an expectation and maximization algorithm for finding the semiparametric maximum likelihood estimator. Simulation results and a real cancer application indicate that the proposed semiparametric method works much better than parametric methods. Supplementary materials for this article are available online.
AB - In genetic backcross studies, data are often collected from complex mixtures of distributions with known mixing proportions. Previous approaches to the inference of these genetic mixture models involve parameterizing the component distributions. However, model misspecification of any form is expected to have detrimental effects. We propose a semiparametric likelihood method for genetic mixture models: the empirical likelihood under the exponential tilting model assumption, in which the log ratio of the probability (density) functions from the components is linear in the observations. An application to mice cancer genetics involves random numbers of offspring within a litter. In other words, the cluster size is a random variable. We wish to test the null hypothesis that there is no difference between the two components in the mixture model, but unfortunately we find that the Fisher information is degenerate. As a consequence, the conventional two-term expansion in the likelihood ratio statistic does not work. By using a higher-order expansion, we are able to establish a nonstandard convergence rate N− 1/4 for the odds ratio parameter estimator (Formula presented.). Moreover, the limiting distribution of the empirical likelihood ratio statistic is derived. The underlying distribution function of each component can also be estimated semiparametrically. Analogously to the full parametric approach, we develop an expectation and maximization algorithm for finding the semiparametric maximum likelihood estimator. Simulation results and a real cancer application indicate that the proposed semiparametric method works much better than parametric methods. Supplementary materials for this article are available online.
KW - Cluster mixture data
KW - Degenerate Fisher information
KW - Exponential tilting model
KW - Genetic mixture model
UR - https://www.scopus.com/pages/publications/85018718935
U2 - 10.1080/01621459.2016.1208614
DO - 10.1080/01621459.2016.1208614
M3 - 文章
AN - SCOPUS:85018718935
SN - 0162-1459
VL - 112
SP - 1250
EP - 1260
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 519
ER -