TY - JOUR
T1 - Prediction of ammonia and total nitrogen in large freshwater lake watershed based on small sample data and analysis of their spatiotemporal variation and driving mechanism
AU - Luo, Chengming
AU - Wang, Xihua
AU - Xu, Y. Jun
AU - Jia, Shunqing
AU - Liu, Zejun
AU - Mao, Boyang
AU - Lv, Qinya
AU - Ji, Xuming
AU - Rong, Yanxin
AU - Dai, Yan
N1 - Publisher Copyright:
© 2025 The Institution of Chemical Engineers
PY - 2025/11
Y1 - 2025/11
N2 - Ammonia nitrogen (NH₃-N) and total nitrogen (TN) pollution pose serious threats to freshwater lake ecosystems, making accurate prediction essential for watershed management. However, limited and variable-quality data challenge the performance of existing prediction models. This study proposed an integrated prediction framework combining sample enhancement, adaptive feature selection, and multiple machine learning methods to improve NH₃-N and TN prediction in the Poyang Lake watershed. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) was used to generate high-quality virtual samples, enhancing data availability. Recursive Feature Elimination (RFE) was then applied to identify key variables and remove redundancy, improving model efficiency. Four models, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), and Extreme Learning Machine, were used to construct prediction models and compared. Meanwhile, Spearman correlation analysis and principal component analysis methods were used to reveal the main sources of TN and NH₃-N pollution. Results showed clear spatiotemporal heterogeneity in NH₃-N and TN levels, with the Fuhe River sub-basin being the most polluted. Agricultural runoff, domestic sewage, and industrial discharge were identified as key pollution sources. WGAN-GP and RFE significantly improved model performance: the R2 of the best prediction model (GRU) for TN improved from 0.515 to 0.709 and the best prediction model (Bi-LSTM) for NH₃-N improved from 0.369 to 0.909. The deep learning models demonstrated similar predictive capabilities and could be integrated to enhance accuracy and stability. This study offers an effective, data-efficient approach for water quality prediction under small-sample conditions and provides scientific guidance for watershed environmental management.
AB - Ammonia nitrogen (NH₃-N) and total nitrogen (TN) pollution pose serious threats to freshwater lake ecosystems, making accurate prediction essential for watershed management. However, limited and variable-quality data challenge the performance of existing prediction models. This study proposed an integrated prediction framework combining sample enhancement, adaptive feature selection, and multiple machine learning methods to improve NH₃-N and TN prediction in the Poyang Lake watershed. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) was used to generate high-quality virtual samples, enhancing data availability. Recursive Feature Elimination (RFE) was then applied to identify key variables and remove redundancy, improving model efficiency. Four models, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), and Extreme Learning Machine, were used to construct prediction models and compared. Meanwhile, Spearman correlation analysis and principal component analysis methods were used to reveal the main sources of TN and NH₃-N pollution. Results showed clear spatiotemporal heterogeneity in NH₃-N and TN levels, with the Fuhe River sub-basin being the most polluted. Agricultural runoff, domestic sewage, and industrial discharge were identified as key pollution sources. WGAN-GP and RFE significantly improved model performance: the R2 of the best prediction model (GRU) for TN improved from 0.515 to 0.709 and the best prediction model (Bi-LSTM) for NH₃-N improved from 0.369 to 0.909. The deep learning models demonstrated similar predictive capabilities and could be integrated to enhance accuracy and stability. This study offers an effective, data-efficient approach for water quality prediction under small-sample conditions and provides scientific guidance for watershed environmental management.
KW - Adaptive feature selection
KW - Ammonia nitrogen
KW - Multiple machine learning
KW - Poyang lake watershed
KW - Sample enhancement
KW - Total nitrogen
UR - https://www.scopus.com/pages/publications/105016458221
U2 - 10.1016/j.psep.2025.107887
DO - 10.1016/j.psep.2025.107887
M3 - 文章
AN - SCOPUS:105016458221
SN - 0957-5820
VL - 203
JO - Process Safety and Environmental Protection
JF - Process Safety and Environmental Protection
M1 - 107887
ER -