K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
Open Access
- 28 June 2021
- journal article
- research article
- Published by MDPI AG in Atmosphere
- Vol. 12 (7), 834
- https://doi.org/10.3390/atmos12070834
Abstract
The machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) algorithms, are used to separate observed precipitation into clusters and classified the associated large-scale circulation indices. Observed precipitation from the Chinese Meteorological Agency (CMA) during 1961–2016 for 83 stations in the Poyang Lake basin (PLB) is used. The results from K-Means clusters show two precipitation clusters splitting the PLB precipitation into a northern and southern cluster, with a silhouette coefficient ~0.5. The PLB precipitation leading cluster (C1) contains 48 stations accounting for 58% of the regional station density, while Cluster 2 (C2) covers 35, accounting for 42% of the stations. The interannual variability in precipitation exhibited significant differences for both clusters. The decision tree (C4.5) is employed to explore the large-scale atmospheric indices from National Climate Center (NCC) associated with each cluster during the preceding spring season as a predictor. The C1 precipitation was linked with the location and intensity of subtropical ridgeline position over Northern Africa, whereas the C2 precipitation was suggested to be associated with the Atlantic-European Polar Vortex Area Index. The precipitation anomalies further validated the results of both algorithms. The findings are in accordance with previous studies conducted globally and hence recommend the applications of machine learning techniques in atmospheric science on a sub-regional and sub-seasonal scale. Future studies should explore the dynamics of the K-Means, and C4.5 derived indicators for a better assessment on a regional scale. This research based on machine learning methods may bring a new solution to climate forecast.Keywords
Funding Information
- The National Key R&D Program of China (2019YFC1510203, HRM201602)
This publication has 56 references indexed in Scilit:
- A review on regional convection‐permitting climate modeling: Demonstrations, prospects, and challengesReviews of Geophysics, 2015
- Discriminating Developing versus Nondeveloping Tropical Disturbances in the Western North Pacific through Decision Tree AnalysisWeather and Forecasting, 2015
- Topography-based spatial patterns of precipitation extremes in the Poyang Lake basin, China: Changing properties and causesJournal of Hydrology, 2014
- Assessing the performance of satellite-based precipitation products and its dependence on topography over Poyang Lake basinTheoretical and Applied Climatology, 2013
- The application of decision tree to intensity change classification of tropical cyclones in western North PacificGeophysical Research Letters, 2013
- A Study on the Occurrence of Crimes Due to Climate Changes Using Decision TreeLecture Notes in Electrical Engineering, 2012
- MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasetsRemote Sensing of Environment, 2010
- K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentationPattern Recognition, 2002
- C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993Machine Learning, 1994
- Silhouettes: A graphical aid to the interpretation and validation of cluster analysisJournal of Computational and Applied Mathematics, 1987