Imputation of Ammonium Nitrogen Concentration in Groundwater Based on a Machine Learning Method

Abstract
Ammonium is one of the main inorganic pollutants in groundwater, mainly due to agricultural, industrial and domestic pollution. Excessive ammonium can cause human health risks and environmental consequences. Its temporal and spatial distribution is affected by factors such as meteorology, hydrology, hydrogeology and land use type. Thus, a groundwater ammonium analysis based on limited sampling points produces large uncertainties. In this study, organic matter content, groundwater depth, clay thickness, total nitrogen content (TN), cation exchange capacity (CEC), pH and land-use type were selected as potential contributing factors to establish a machine learning model for fitting the ammonium concentration. The Shapley Additive exPlanations (SHAP) method, which explains the machine learning model, was applied to identify the more significant influencing factors. Finally, the machine learning model established according to the more significant influencing factors was used to impute point data in the study area. From the results, the soil organic matter feature was found to have a substantial impact on the concentration of ammonium in the model, followed by soil pH, clay thickness and groundwater depth. The ammonium concentration generally decreased from northwest to southeast. The highest values were concentrated in the northwest and northeast. The lowest values were concentrated in the southeast, southwest and parts of the east and north. The spatial interpolation based on the machine learning imputation model established according to the influencing factors provides a reliable groundwater quality assessment and was not limited by the number and the geographical location of samplings.
Funding Information
  • National Natural Science Foundation of China (41672231)