TY - JOUR
T1 - Development of gradient boosting-assisted machine learning data-driven model for free chlorine residual prediction
AU - Helm, Wiley
AU - Zhong, Shifa
AU - Reid, Elliot
AU - Igou, Thomas
AU - Chen, Yongsheng
N1 - Publisher Copyright:
© 2024, Higher Education Press.
PY - 2024/2
Y1 - 2024/2
N2 - Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important index of disinfection efficiency is the free chlorine residual (FCR), a regulated disinfection parameter in the US that indirectly measures disinfectant power for prevention of microbial recontamination during DWT and distribution. This work demonstrates how machine learning (ML) can be implemented to improve FCR forecasting when supplied with water quality data from a real, full-scale chlorine disinfection system in Georgia, USA. More precisely, a gradient-boosting ML method (CatBoost) was developed from a full year of DWT plant-generated chlorine disinfection data, including water quality parameters (e.g., temperature, turbidity, pH) and operational process data (e.g., flowrates), to predict FCR. Four gradient-boosting models were implemented, with the highest performance achieving a coefficient of determination, R 2, of 0.937. Values that provide explanations using Shapley’s additive method were used to interpret the model’s results, uncovering that standard DWT operating parameters, although non-intuitive and theoretically non-causal, vastly improved prediction performance. These results provide a base case for data-driven DWT disinfection supervision and suggest process monitoring methods to provide better information to plant operators for implementation of safe chlorine dosing to maintain optimum FCR. [Figure not available: see fulltext.]
AB - Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important index of disinfection efficiency is the free chlorine residual (FCR), a regulated disinfection parameter in the US that indirectly measures disinfectant power for prevention of microbial recontamination during DWT and distribution. This work demonstrates how machine learning (ML) can be implemented to improve FCR forecasting when supplied with water quality data from a real, full-scale chlorine disinfection system in Georgia, USA. More precisely, a gradient-boosting ML method (CatBoost) was developed from a full year of DWT plant-generated chlorine disinfection data, including water quality parameters (e.g., temperature, turbidity, pH) and operational process data (e.g., flowrates), to predict FCR. Four gradient-boosting models were implemented, with the highest performance achieving a coefficient of determination, R 2, of 0.937. Values that provide explanations using Shapley’s additive method were used to interpret the model’s results, uncovering that standard DWT operating parameters, although non-intuitive and theoretically non-causal, vastly improved prediction performance. These results provide a base case for data-driven DWT disinfection supervision and suggest process monitoring methods to provide better information to plant operators for implementation of safe chlorine dosing to maintain optimum FCR. [Figure not available: see fulltext.]
KW - Chlorination
KW - Data-driven modeling
KW - Disinfection
KW - Drinking water treatment
KW - Machine learning
UR - https://www.scopus.com/pages/publications/85178477685
U2 - 10.1007/s11783-024-1777-6
DO - 10.1007/s11783-024-1777-6
M3 - 文章
AN - SCOPUS:85178477685
SN - 2095-2201
VL - 18
JO - Frontiers of Environmental Science and Engineering
JF - Frontiers of Environmental Science and Engineering
IS - 2
M1 - 17
ER -