Multi-step ahead dissolved oxygen concentration prediction based on knowledge guided ensemble learning and explainable artificial intelligence

  • Junhao Wu
  • , Zhaocai Wang*
  • , Jinghan Dong
  • , Zhiyuan Yao
  • , Xi Chen
  • , Heshan Fan
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

Accurate water quality prediction is crucial for effective environmental management and decision-making. However, previous studies have solely relied on historical data to simulate water quality, overlooking the potential discrepancies between predicted values and actual observations. Additionally, the opacity of machine learning models has posed challenges to the credibility of their predictions. Hence, considering the excellent nonlinear fitting ability of ensemble tree models, especially the Categorical Boosting (Catboost) model, this study proposes a knowledge-guided Catboost (KGCatboost) model for predicting the dissolved oxygen concentration, one of the vital water quality indicators, in 15 river sections of the Yangtze River Basin in Yunnan Province, China. Furthermore, to enhance the model's interpretability, we employ the SHapley Additive exPlanations (SHAP) method to analyze the contributions of each input variable within the water body. The results demonstrate that on the test set of each dataset, the mean Nash-Sutcliffe Efficiency (NSE) value of KGCatboost is 0.874, which has improved by 0.34% and 3.07% compared to Catboost and eXtreme Gradient Boosting (Xgboost). In addition, this study reveals that pH has the most significant impact on DO concentrations. Specifically, as the pH increased, the DO concentration increased significantly. A regulatory mechanism has also been developed to alleviate the hazards caused by low DO concentrations. The KGCatboost model can provide valuable guidance for water resource management processes.

Original languageEnglish
Article number131297
JournalJournal of Hydrology
Volume636
DOIs
StatePublished - Jun 2024

Keywords

  • Categorical boosting
  • Dissolved oxygen
  • EXtreme gradient boosting
  • Knowledge-guided catboost
  • Multi-step ahead prediction
  • SHapley additive explanations

Fingerprint

Dive into the research topics of 'Multi-step ahead dissolved oxygen concentration prediction based on knowledge guided ensemble learning and explainable artificial intelligence'. Together they form a unique fingerprint.

Cite this