TY - JOUR
T1 - SR-Forest
T2 - A Genetic Programming-Based Heterogeneous Ensemble Learning Method
AU - Zhang, Hengzhe
AU - Zhou, Aimin
AU - Chen, Qi
AU - Xue, Bing
AU - Zhang, Mengjie
N1 - Publisher Copyright:
© 1997-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Ensemble learning methods have been widely used in machine learning in recent years due to their high-predictive performance. With the development of genetic programming-based symbolic regression (GPSR) methods, many papers begin to choose a popular ensemble learning method, random forests (RFs), as the baseline competitor. Instead of considering them as competitors, an alternative idea might be to consider symbolic regression (SR) as an enhancement technique for RF. GPSR methods which fit a smooth function are complementary to the piecewise nature of decision trees (DTs), as the smooth variation is common in regression problems. In this article, we propose to form an ensemble model with SR-based DTs to address this issue. Furthermore, we design a guided mutation operator to speed up the search on high-dimensional problems, a multifidelity evaluation strategy to reduce the computational cost, and an ensemble selection mechanism to improve predictive performance. Finally, experimental results on a regression benchmark with 120 datasets show that the proposed ensemble model outperforms 25 existing SR and ensemble learning methods. Moreover, the proposed method can provide notable insights on an XGBoost hyperparameter performance prediction task, which is an important application area of ensemble learning methods.
AB - Ensemble learning methods have been widely used in machine learning in recent years due to their high-predictive performance. With the development of genetic programming-based symbolic regression (GPSR) methods, many papers begin to choose a popular ensemble learning method, random forests (RFs), as the baseline competitor. Instead of considering them as competitors, an alternative idea might be to consider symbolic regression (SR) as an enhancement technique for RF. GPSR methods which fit a smooth function are complementary to the piecewise nature of decision trees (DTs), as the smooth variation is common in regression problems. In this article, we propose to form an ensemble model with SR-based DTs to address this issue. Furthermore, we design a guided mutation operator to speed up the search on high-dimensional problems, a multifidelity evaluation strategy to reduce the computational cost, and an ensemble selection mechanism to improve predictive performance. Finally, experimental results on a regression benchmark with 120 datasets show that the proposed ensemble model outperforms 25 existing SR and ensemble learning methods. Moreover, the proposed method can provide notable insights on an XGBoost hyperparameter performance prediction task, which is an important application area of ensemble learning methods.
KW - Evolutionary feature construction
KW - evolutionary forest (EF)
KW - genetic programming (GP)
KW - random forest (RF)
UR - https://www.scopus.com/pages/publications/85148443420
U2 - 10.1109/TEVC.2023.3243172
DO - 10.1109/TEVC.2023.3243172
M3 - 文章
AN - SCOPUS:85148443420
SN - 1089-778X
VL - 28
SP - 1484
EP - 1498
JO - IEEE Transactions on Evolutionary Computation
JF - IEEE Transactions on Evolutionary Computation
IS - 5
ER -