Abstract
One main challenge for statistical prediction with data from multiple sources is that not all the associated covariate data are available for many sampled subjects. Consequently, we need new statistical methodology to handle this type of “fragmentary data” that has become more and more popular in recent years. In this article, we propose a novel method based on the frequentist model averaging that fits some candidate models using all available covariate data. The weights in model averaging are selected by delete-one cross-validation based on the data from complete cases. The optimality of the selected weights is rigorously proved under some conditions. The finite sample performance of the proposed method is confirmed by simulation studies. An example for personal income prediction based on real data from a leading e-community of wealth management in China is also presented for illustration.
| Original language | English |
|---|---|
| Pages (from-to) | 517-527 |
| Number of pages | 11 |
| Journal | Journal of Business and Economic Statistics |
| Volume | 37 |
| Issue number | 3 |
| DOIs | |
| State | Published - 3 Jul 2019 |
Keywords
- Asymptotic optimality
- Cross-validation
- Heteroscedastic errors
- Linear regression models
- Multiple data sources