Modeling the Relationship between Rice Yield and Climate Variables Using Statistical and Machine Learning Techniques

Abstract
This paper presents the application of a multiple number of statistical methods and machine learning techniques to model the relationship between rice yield and climate variables of a major region in Sri Lanka, which contributes significantly to the country’s paddy harvest. Rainfall, temperature (minimum and maximum), evaporation, average wind speed (morning and evening), and sunshine hours are the climatic factors considered for modeling. Rice harvest and yield data over the last three decades and monthly climatic data were used to develop the prediction model by applying artificial neural networks (ANNs), support vector machine regression (SVMR), multiple linear regression (MLR), Gaussian process regression (GPR), power regression (PR), and robust regression (RR). The performance of each model was assessed in terms of the mean squared error (MSE), correlation coefficient (R), mean absolute percentage error (MAPE), root mean squared error ratio (RSR), BIAS value, and the Nash number, and it was found that the GPR-based model is the most accurate among them. Climate data collected until early 2019 (Maha season of year 2018) were used to develop the model, and an independent validation was performed by applying data of the Yala season of year 2019. The developed model can be used to forecast the future rice yield with very high accuracy.