TY - JOUR
T1 - Identifying subjective life expectancy risk factors in physically active and inactive middle-aged and older adults using machine learning models
AU - Yang, Jian
AU - Li, Zhihui
AU - Wu, Ming
AU - Zhang, Yuan
AU - Zhang, Bianjiang
AU - Shi, Huiyu
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Background: Physical activity is a key focus in the field of public health, and subjective life expectancy is closely associated with individuals’ physical and psychological well-being. This study aimed to identify the risk factors for subjective life expectancy among middle-aged and older adults with active and inactive physical activity levels, and to provide an evidence base for developing differentiated health intervention strategies. Methods: Based on data from the China Health and Retirement Longitudinal Study (CHARLS) 2018 survey, a total of 10,945 participants were included. Five machine learning models, including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were separately constructed for the active and inactive groups. To reduce bias caused by class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to generate synthetic samples for the minority class. The dataset was split into a training set (70%) and a testing set (30%), and ten-fold cross-validation combined with grid search was employed to optimize hyperparameters, ensuring both robustness and generalizability of the models. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUC), accuracy, sensitivity, specificity, and F1-score. Results: The active group (4,707 men and 4,885 women) had a mean age of 59.76 years, while the inactive group (662 men and 691 women) had a mean age of 63.00 years. The Support Vector Machine (SVM) model achieved the best performance in the inactive group (AUC: 0.797; accuracy: 0.722; sensitivity: 0.747), whereas the Light Gradient Boosting Machine (LightGBM) model achieved the best performance in the active group (AUC: 0.775; accuracy: 0.745; specificity: 0.814). Feature importance analysis indicated that “age” was the most important variable in the Support Vector Machine (SVM) model, while “perceived health” was the most important variable in the Light Gradient Boosting Machine (LightGBM) model. Conclusion: Machine learning methods can effectively identify key risk factors influencing subjective life expectancy among middle-aged and older adults, and provide valuable guidance for targeted health management strategies tailored to populations with different levels of physical activity.
AB - Background: Physical activity is a key focus in the field of public health, and subjective life expectancy is closely associated with individuals’ physical and psychological well-being. This study aimed to identify the risk factors for subjective life expectancy among middle-aged and older adults with active and inactive physical activity levels, and to provide an evidence base for developing differentiated health intervention strategies. Methods: Based on data from the China Health and Retirement Longitudinal Study (CHARLS) 2018 survey, a total of 10,945 participants were included. Five machine learning models, including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were separately constructed for the active and inactive groups. To reduce bias caused by class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to generate synthetic samples for the minority class. The dataset was split into a training set (70%) and a testing set (30%), and ten-fold cross-validation combined with grid search was employed to optimize hyperparameters, ensuring both robustness and generalizability of the models. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUC), accuracy, sensitivity, specificity, and F1-score. Results: The active group (4,707 men and 4,885 women) had a mean age of 59.76 years, while the inactive group (662 men and 691 women) had a mean age of 63.00 years. The Support Vector Machine (SVM) model achieved the best performance in the inactive group (AUC: 0.797; accuracy: 0.722; sensitivity: 0.747), whereas the Light Gradient Boosting Machine (LightGBM) model achieved the best performance in the active group (AUC: 0.775; accuracy: 0.745; specificity: 0.814). Feature importance analysis indicated that “age” was the most important variable in the Support Vector Machine (SVM) model, while “perceived health” was the most important variable in the Light Gradient Boosting Machine (LightGBM) model. Conclusion: Machine learning methods can effectively identify key risk factors influencing subjective life expectancy among middle-aged and older adults, and provide valuable guidance for targeted health management strategies tailored to populations with different levels of physical activity.
KW - Machine learning
KW - Middle-aged and elderly people
KW - Physical activity
KW - Subjective life expectancy
UR - https://www.scopus.com/pages/publications/105018883475
U2 - 10.1186/s12889-025-24657-1
DO - 10.1186/s12889-025-24657-1
M3 - 文章
C2 - 41094430
AN - SCOPUS:105018883475
SN - 1472-698X
VL - 25
JO - BMC Public Health
JF - BMC Public Health
IS - 1
M1 - 3506
ER -