Big Data Application in Education: Dropout Prediction in Edx MOOCs

Abstract
Educational Data Mining and Learning Analytics are two growing fields of study, trying to make sense of education data and to improve teaching and learning experience. We study dropout prediction in Massively Open Online Courses (MOOCS), where the goal is given student's learning behavior log data in one month, to predict whether students would drop out in next ten days. We collect 39 courses data from XuetangX platform, which is based on the open source Edx platform. We describe our complete approach to cope with drop out prediction task, including data extraction from Edx platform, data preprocessing, feature engineering and performance test on several supervised classification model such as SVM, Logistics Regression, Random Forest and Gradient Boosting Decision Tree. We achieve 88% accuracy in dropout prediction task with GBDT model.

This publication has 7 references indexed in Scilit: