Predictive Modeling of Student Dropout in MOOCs

Predictive Modeling of Student Dropout in MOOCs: Comparison

Please note this is a comparison between Version 1 by Georgios Psathas and Version 2 by Lindsay Dong.

The features of massive open online courses (MOOCs), such as internet-based massiveness, openness, and flexible learning, create a unique blend of a large number of learners, making the prediction of learner success (as well as providing support based on these predictions) particularly challenging.

Massive Open Online Courses
prediction
MOOCs

1. Massive Open Online Courses (MOOCs)

MOOCs are a new distance education model established in 2008 by George Siemens. Many MOOC platforms offer educational design with video lectures, announcements, forums, and assessments (quizzes, assignments, etc.) ^[1][2][1,2]. Some MOOCs allow students to progress at their own pace, while others follow a predetermined schedule [3]. The acquisition of completion certificates serves as a motivation for many students [4]. Most students fail to complete MOOCs successfully, even if they intend to do so [5]. The challenges of MOOCs include the absence of a supporting and guiding instructor [6], limited social interaction between teachers and students [7], and the most critical challenge lies in the high dropout rates observed within the MOOC environment [3]. In the literature, dropout rate percentages are cited as 93.5% [8] or 91–93% [9].

Many studies have extensively explored the phenomenon of dropout rates in MOOC courses. Several researchers have indicated significantly low percentages in completion rates, falling below 10% and even as low as 5% ^{[3][10][11][12]}[3,10,11,12]. Although these figures may appear alarming, such an assessment is based on the assumption that enrollment in a MOOC is comparable to enrolling in a traditional course, which is not always the case, as the intentions of individuals enrolling in a MOOC differ—some seek professional development. In contrast, others pursue simple information or entertainment [11]. However, it is essential to acknowledge some positive exceptions to the high dropout rates. For instance, programming MOOCs demonstrated retention rates above 60% ^[3][13][3,13].

Identifying and exploring factors directly influencing the attrition of students from MOOCs will enable researchers and educators to examine novel strategies and techniques to enhance students’ persistence and successful course completion. Dalipi et al. [14] have categorized these factors contributing to high dropout rates into those associated with students (such as lack of motivation, poor time management, inadequate background knowledge, and skills) and those related to MOOCs (course design, lack of interactions, hidden costs).

Ihantola et al. [3] conducted a study to investigate the attrition rates of students in MOOCs with flexible versus strict scheduling. The findings revealed that students enrolled in MOOCs with flexible scheduling were more likely to drop out early compared to those in MOOCs with rigid schedules. In their study, approximately 17% of students abandoned the course within the first week, while the corresponding rate for the flexible MOOC was 50%. However, after the initial week, the dropout behavior between the two versions of MOOCs became nearly similar.

Furthermore, the researchers observed that both versions of MOOCs had students who completed all computer programming assignments within a week but did not continue further, possibly due to perceiving a heavy workload. Therefore, the authors suggest that identifying the profiles of students who benefit from each type of MOOC could lead to novel, more effective methods of organizing and grading courses. Previous studies have also identified that the lack of a sense of community and ineffective social interactions and collaborations contribute to the high attrition rates in MOOCs ^[7][14][15][7,14,15].

Hone and El Said [16] also investigated factors influencing retention in MOOCs and found that 32.2% of students successfully completed their preferred courses, a rate surpassing the average completion rate. The main driver for their completion was the satisfaction derived from the course content, which was perceived as unique and not readily available elsewhere. However, non-completers identified several reasons for their discontinuation, including feelings of isolation due to inadequate communication channels, perceived complexity and technical difficulties of the courses, and a lack of engagement. A related study by Zhang [17] explored how to enhance MOOC attractiveness by aligning courses with students’ regulatory foci. The observations made in theis study indicate that students with promotion-focused mindsets were more influenced by advocates emphasizing gains and positive outcomes, while prevention-focused students responded better to advocates stressing the avoidance of losses.

2. Prediction of Dropout and the SRL Factor

According to Gardner and Brooks (2018) [2], regarding the statistical models used to map features to predictions, supervised learning techniques are extensively used in predictive student modeling in MOOCs, compared to unsupervised approaches, as student dropout/stopout is easily observable. In supervised learning, machines are trained using well “labelled” training data, and on the basis of that data, machines predict the output. Authors chose to present indicatively popular techniques for MOOC learner modeling, with very good empirical performance in large-scale MOOC modeling studies, (e.g., Dass et al., 2021 [18]). LR and SVM are among the most used, while NB, kNN, and DT are less frequently used in surveys ([2]). There is no pattern found for an algorithm to be distinguished compared to others. According to Herrmannova et al., 2015 [19], each model captures different properties of input data and the results are complementary. Self-regulated learning (SRL) is a complex multidimensional phenomenon often described by a set of individual cognitive, social, metacognitive, and behavioral processes embedded in a cyclical model. Students with limited application of SRL strategies do not perform well ^{[1][20][21][22]}[1,20,21,22]. Zimmerman [23] proposed a cyclical model consisting of three interrelated phases of the learning process. In the forethought phase, self-regulated learners set learning goals and design the strategy for their learning. This is followed by the performance and control phase, during which self-regulated learners employ strategies to process the learning material. They seek help when needed, manage their time, structure their environment, and monitor their learning processes. In the third phase of self-reflection, self-regulated learners evaluate their performance and adjust their strategies to achieve their learning goals ^{[1][18][22][24][25]}[1,18,22,24,25]. Research using questionnaires has shown positive correlations between the mentioned SRL activity and the completion of MOOCs [24]. Despite the limitations of self-reported data [26], the large sample size enhances their usefulness. On the other hand, the use of trace data in measuring SRL has increased, but their interpretation remains challenging [26]. Jansen et al. [24] propose the combined use of SRL data from traces and questionnaires. Timely SRL support interventions in MOOCs should be considered significant pedagogical tools that contribute to achieving positive outcomes for students ^[20][21][20,21]. The features of MOOCs, such as internet-based massiveness, openness, and flexible learning, create a unique blend of a large number of learners, making the prediction of learner success (as well as providing support based on these predictions) particularly challenging. Several researchers have developed prediction models by employing machine learning (ML) algorithms [27] and adopting supervised, unsupervised, and semi-supervised architectures [28]. Deep learning methods are also utilized for predicting dropout. For instance, Moreno-Marcos et al. [29] applied a combination of random forest (RF), generalized linear model (GLM), support vector machines (SVM), and decision trees (DT). Feng et al. [10] utilized logistic regression (LR), support vector machine with a linear kernel (SVM), random forest (RF), gradient boosting decision tree (GBDT), and a three-layer deep neural network (DNN) for their analysis. Diverse features, even in limited quantities, provide a more comprehensive, multidimensional view of learners and can improve the quality of predictive models. Collecting additional data, especially during the initial weeks of a course, enhances prediction performance ^[2][28][2,28]. For successful timely interventions, predictive models need to be transferable, meaning they perform well on new course iterations by utilizing historical data ^[30][31][32][30,31,32]. Some researchers have examined specific aspects of SRL and observed its impact on predicting success, including goal setting and strategic planning [6], student-programmed plans [33], the combination of self-reported SRL strategies and patterns of interaction sequences, demographic features, and intentions [34]. For example, Kizilcec et al. (2017) [6] investigated which specific SRL strategies predict the attainment of personal course goals and how they are mapped in specific interactions with MOOCs’ online content. Yielding to form a longitudinal account of SRL, authors combined learners’ self-reported SRL strategies and characteristics, achievement data, and records of individual’s engagement with the course’s content. Using multiple linear and logistic regression modeling, Maldonado-Mahauad et al. (2018) [34] concluded that specific self-reported SRL strategies (“goal setting”, “strategic planning”, “elaboration” and “help seeking”), complex behavioral data on MOOC’s platform like meaningful online activity sequence patterns, self-reported prior experience and level of interest in MOOC’s assessments as well as total time spent online belong to factors that contribute to the prediction of MOOC learners’ success.