TY - JOUR
T1 - Multi-step ahead dissolved oxygen concentration prediction based on knowledge guided ensemble learning and explainable artificial intelligence
AU - Wu, Junhao
AU - Wang, Zhaocai
AU - Dong, Jinghan
AU - Yao, Zhiyuan
AU - Chen, Xi
AU - Fan, Heshan
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/6
Y1 - 2024/6
N2 - Accurate water quality prediction is crucial for effective environmental management and decision-making. However, previous studies have solely relied on historical data to simulate water quality, overlooking the potential discrepancies between predicted values and actual observations. Additionally, the opacity of machine learning models has posed challenges to the credibility of their predictions. Hence, considering the excellent nonlinear fitting ability of ensemble tree models, especially the Categorical Boosting (Catboost) model, this study proposes a knowledge-guided Catboost (KGCatboost) model for predicting the dissolved oxygen concentration, one of the vital water quality indicators, in 15 river sections of the Yangtze River Basin in Yunnan Province, China. Furthermore, to enhance the model's interpretability, we employ the SHapley Additive exPlanations (SHAP) method to analyze the contributions of each input variable within the water body. The results demonstrate that on the test set of each dataset, the mean Nash-Sutcliffe Efficiency (NSE) value of KGCatboost is 0.874, which has improved by 0.34% and 3.07% compared to Catboost and eXtreme Gradient Boosting (Xgboost). In addition, this study reveals that pH has the most significant impact on DO concentrations. Specifically, as the pH increased, the DO concentration increased significantly. A regulatory mechanism has also been developed to alleviate the hazards caused by low DO concentrations. The KGCatboost model can provide valuable guidance for water resource management processes.
AB - Accurate water quality prediction is crucial for effective environmental management and decision-making. However, previous studies have solely relied on historical data to simulate water quality, overlooking the potential discrepancies between predicted values and actual observations. Additionally, the opacity of machine learning models has posed challenges to the credibility of their predictions. Hence, considering the excellent nonlinear fitting ability of ensemble tree models, especially the Categorical Boosting (Catboost) model, this study proposes a knowledge-guided Catboost (KGCatboost) model for predicting the dissolved oxygen concentration, one of the vital water quality indicators, in 15 river sections of the Yangtze River Basin in Yunnan Province, China. Furthermore, to enhance the model's interpretability, we employ the SHapley Additive exPlanations (SHAP) method to analyze the contributions of each input variable within the water body. The results demonstrate that on the test set of each dataset, the mean Nash-Sutcliffe Efficiency (NSE) value of KGCatboost is 0.874, which has improved by 0.34% and 3.07% compared to Catboost and eXtreme Gradient Boosting (Xgboost). In addition, this study reveals that pH has the most significant impact on DO concentrations. Specifically, as the pH increased, the DO concentration increased significantly. A regulatory mechanism has also been developed to alleviate the hazards caused by low DO concentrations. The KGCatboost model can provide valuable guidance for water resource management processes.
KW - Categorical boosting
KW - Dissolved oxygen
KW - EXtreme gradient boosting
KW - Knowledge-guided catboost
KW - Multi-step ahead prediction
KW - SHapley additive explanations
UR - https://www.scopus.com/pages/publications/85192956725
U2 - 10.1016/j.jhydrol.2024.131297
DO - 10.1016/j.jhydrol.2024.131297
M3 - 文章
AN - SCOPUS:85192956725
SN - 0022-1694
VL - 636
JO - Journal of Hydrology
JF - Journal of Hydrology
M1 - 131297
ER -