Risk prediction of two types of potential snail habitats in Anhui Province of China: Model-based approaches

Abstract
Elimination of the intermediate snail host of Schistosoma is the most effective way to control schistosomiasis and the most important first step is to accurately identify the snail habitats. Due to the substantial resources required for traditional, manual snail-searching in the field, and potential risk of miss-classification of potential snail habitats by remote sensing, more convenient and precise methods are urgently needed. Snail data (N = 15,000) from two types of snail habitats (lake/marshland and hilly areas) in Anhui Province, a typical endemic area for schistosomiasis, were collected together with 36 environmental variables covering the whole province. Twelve different models were built and evaluated with indices, such as area under the curve (AUC), Kappa, percent correctly classified (PCC), sensitivity and specificity. We found the presence-absence models performing better than those based on presence-only. However, those derived from machine-learning, especially the random forest (RF) approach were preferable with all indices above 0.90. Distance to nearest river was found to be the most important variable for the lake/marshlands, while the climatic variables were more important for the hilly endemic areas. The predicted high-risk areas for potential snail habitats of the lake/marshland type exist mainly along the Yangtze River, while those of the hilly type are dispersed in the areas south of the Yangtze River. We provide here the first comprehensive risk profile of potential snail habitats based on precise examinations revealing the true distribution and habitat type, thereby improving efficiency and accuracy of snail control including better allocation of limited health resources. Schistosomiasis is a parasitic disease caused by parasitic worms of the genus Schistosoma. In China, the sole intermediate snail host is Oncomelania hupensis whose elimination has proved to be the most effective way to interrupt this disease. However, manual snail-searching is labour-intensive, expensive and time-consuming and can lead to inaccurate results. For a better approach, 12 models were employed and compared to characterise the typical snail habitats that vary between the lake/marshlands and the hilly areas. We found that the two types of snail habitats showed notable differences during the modelling process, mainly due to the impact of environmental variables that can form different types of habitats. We further found that habitat characterization contributed to better prediction of areas at risk, and that the precision was high, especially of models based on machine-learning algorithms such as random forest (RF). The highest level of accuracy was achieved by the support vector machine (SVM) approach and artificial neural networks (ANN). Our study provides new insights into accurate prediction of the spatial distribution of potential snail habitats with machine-learning as the preferred approach.
Funding Information
  • National Natural Science Foundation of China (81673239, 81773487)