Improving machine-learned surface NO2 concentration mapping models with domain knowledge from data science perspective

Mengqian Hu, Kaixu Bai, Ke Li, Zhe Zheng, Yibing Sun, Liuqing Shao, Ruijie Li, Chaoshun Liu

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Learning from big Earth data via supervised machine learning has become a popular approach for ambient air quality mapping. However, knowledge gap remains as which level satellite data should be used as the critical proxy variable, and how to improve data-driven models with domain knowledge are also still elusive. By taking surface NO2 concentration mapping as illustration, here we performed inter-comparison studies between a set of machine-learned surface NO2 concentration estimation models established with different levels of satellite products, varying from Level 1 (L1) apparent radiance from TROPOMI on board Sentinel-5p to Level 2 (L2) NO2 slant column density (SCD) and tropospheric vertical column density (VCD). TROPOMI bands sensitive to surface NO2 were firstly pinpointed via radiative transfer simulations while band ratios between nine sensitive and adjacent insensitive channels were then calculated and used as the counterpart of raw radiance observations. The results indicated that the prediction model trained with L1 band ratios at few discrete channels yielded higher prediction accuracy (R2 = 0.71, RMSE = 7.98 μg m−3) than that using raw L1 radiance data at all available bands (R2 = 0.68, RMSE = 8.40 μg m−3), largely benefiting from the improved signal-to-noise ratio and reduced model complexity due to fewer band ratio inputs. Yet even higher modeling accuracies were attained with L2 data products, the model with SCD (R2 = 0.78, RMSE = 6.54 μg m−3) were found to perform even slightly better than that of VCD (R2 = 0.77, RMSE = 6.79 μg m−3), though the latter is supposed to better correlate with surface NO2 variations. The modeling accuracy was further improved with the inclusion of solar zenith angle, aerosol optical depth, surface albedo and pressure that are highly associated with air mass factor, with R2 improved to 0.80 and RMSE reduced to 6.28 μg m−3. Overall, our results not only provide actional guidance on satellite-based surface NO2 concentration modeling but also underscore the critical importance of domain knowledge in improving machine-learned models to aid in large scale air quality surveillance.

Original languageEnglish
Article number120372
JournalAtmospheric Environment
Volume322
DOIs
StatePublished - 1 Apr 2024

Keywords

  • Air quality
  • Big data analytics
  • Machine learning
  • NO pollution
  • TROPOMI

Fingerprint

Dive into the research topics of 'Improving machine-learned surface NO2 concentration mapping models with domain knowledge from data science perspective'. Together they form a unique fingerprint.

Cite this