Science & Technology

Machine learning for uncovering the mass-transfer mechanisms of organic contaminant source-plume in groundwater and accurately prediction of contaminant fluxes

The closure and relocation of China's chemical and pesticide companies have left behind thousands of organic contaminated sites, which seriously threaten the soil and groundwater environment. The downstream contaminant flux describes the mass flux of dissolved plumes released by organic contaminant source zones, and is often used as an important indicator in site decision-making and environmental risk assessment.

The contaminant mass flux is controlled by the organic contaminant source zone architecture. Since there is a highly nonlinear relationship between the contaminant source zone architecture and the downstream mass flux of the dissolved plume, it is difficult to explicitly represent this relationship by physical laws. The complex relationship between contaminant source and dissolved plume still needs to be clarified. Previous studies have only explored the qualitative relationship between the two, making it difficult to quantitatively characterize the nonlinear multi-stage dissolution behavior of the organic contaminant source zones, and there is a lack of accurate predictive models to simulate this non-monotonic dissolution process.

Recently, the research group of Prof. Jichun Wu and Prof. Xiaoqing Shi proposed a new method based on explainable neural networks to explore the mass-transfer mechanisms of organic contaminant source-plume in groundwater and developed an accurate model for contaminant mass flux prediction. To represent the highly nonlinear relationship between source zones and dissolved plumes, they built a Bayesian Neural Network (BNN) framework to learn this “source-plume” relationship from a data-driven perspective. Then, the interpretive machine learning method (Expected Gradients) was utilized to reveal the physical causes and the main controlling factors of the multistage nonlinear dissolution behavior of organic contaminant source zones. The results show that: the peak value of the mass flux in the early stage of source depletion is mainly controlled by the vertical distribution of low-saturation source zones, while the tail value of the mass flux in the late stage is controlled by both the lateral length and mean permeability of high-saturation source zones. Based on these findings, a deep learning-based upscaled model for contaminant mass flux prediction was developed. With only a small number of source-zone spatial metrics, the multi-stage nonlinear dissolution process can be accurately reproduced, which can provide strong technical support for the management and risk assessment of the contaminated sites.

Figure 1. Use the BNN to learn the highly nonlinear relationship between the contaminant source zone and the dissolved mass flux


Figure 2. The main controlling factors of the multistage dissolution process identified by the interpretive machine learning method

Figure 3. The predicted contaminant mass flux and uncertainty quantification of the prediction provided by the proposed BNN-based model

The abovementioned study was recently published in Water Resources Research, an prestigious journal in Hydrology and Water Resources, under the title “Modeling upscaled mass discharge from complex DNAPL source zones using a Bayesian Neural Network: prediction accuracy, uncertainty quantification and source zone feature importance”. Dr. Xueyuan Kang (Nanjing University) is the first author of the paper, Prof. Xiaoqing Shi and Prof. Jichun Wu are the co-corresponding authors. The co-authors include Prof. Amalia Kokkinaki (University of San Francisco), Prof. Jonghyun Lee (University of Hawaii), Prof. Guo Zhilin (Southern University of Science and Technology), and Dr. Lingling Ni (Nanjing Hydraulic Research Institute). This research was jointly funded by the National Natural Science Foundation of China and the AI & AI for Science Project of Nanjing University. The research group has previously developed a series of methods for high-resolution characterization/simulation of organic contaminant source zone architectures using machine learning (such as Kang et al., 2021WRR, 2022WRR). Based on these previous developed methods, this study (Kang et al., 2024) uses interpretable neural networks to help understand the mass transfer mechanism of organic contaminant source zone and establish an upscaled dissolution model. Combined with the aforementioned high-resolution characterization methods, it will better serve the management and control of contaminated sites.


Information of the published paper

[1] Kang, X., Kokkinaki, A., Shi, X., Lee, J., Guo, Z., Ni, L., Wu, J. (2024). Modeling Upscaled Mass Discharge From Complex DNAPL Source Zones Using a Bayesian Neural Network: Prediction Accuracy, Uncertainty Quantification and Source Zone Feature Importance. Water Resources Research, 60, e2023WR036864. https://doi.org/10.1029/2023WR036864

[2] Kang, X., Kokkinaki, A., Shi, X., Yoon, H., Lee, J., Kitanidis, P. K., Wu, J. (2022). Integration of Deep LearningBased Inversion and Upscaled MassTransfer Model for DNAPL MassDischarge Estimation and Uncertainty Assessment. Water Resources Research, 58, e2022WR033277. https://doi.org/10.1029/2022WR033277

[3] Kang, X., Kokkinaki, A., Kitanidis, P. K., Shi, X., Lee, J., Mo, S., Wu, J. (2021). Hydrogeophysical Characterization of Nonstationary DNAPL Source Zones by Integrating a Convolutional Variational Autoencoder and Ensemble Smoother. Water Resources Research, 57, e2020WR028538. https://doi.org/10.1029/2020WR028538



Source: School of Earth Science and Engineering

Correspondent: Wu Yiwen