Comparison Of Four Methodologies For Modeling Student Retention In Engineering
Author(s) -
P.K. Imbrie,
Joe Lin,
Kenneth Reid
Publication year - 2020
Publication title -
papers on engineering education repository (american society for engineering education)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.18260/1-2--16677
Subject(s) - metacognition , cognition , computer science , expectancy theory , logistic regression , mathematics education , artificial intelligence , regression analysis , teamwork , structural equation modeling , multilevel model , psychology , machine learning , social psychology , neuroscience , political science , law
Several methodologies based on statistical methods or machine learning theories have been applied in previous studies for the modeling of student retention. However, most prior studies were based solely on a specific modeling method of authors’ choice. Direct comparison of competing methods using identical collection of student retention data was rarely provided. The purpose of this paper is to present a direct comparison of prominent methods for modeling student retention using the same data. Four modeling methodologies (neural networks, logistic regression, discriminant analysis and structural equation modeling) are included in this study. These competing methods were implemented on five retention models with various collections of cognitive and non-cognitive factors, ranging from 9 to 71 variables. The retention data in this study were collected from more than 1500 first year engineering students in a large Midwestern university. The eleven cognitive attributes include high school GPAs, standardized test scores, and the grades and number of semesters in math, science and English courses in high school. The non-cognitive variables were collected through Student Attitudinal Success Instrument (SASI), covering the following nine constructs: Leadership, Deep Learning, Surface Learning, Teamwork, Academic Self-efficacy, Motivation, Metacognition, Expectancy-value, and Major Decision. The following findings are found during this study. First, among the five retention models, the two hybrid models with both cognitive and non-cognitive factors always perform better than models consisting of either only cognitive, or only non-cognitive factors. Second, the addition of non-cognitive items can significantly improve the prediction performance of a cognitive-only model when applied properly. Third, neural network methods perform better than the other three methodologies in performance indices, followed by logistic regression. However, logistic regression may be attractive to some researchers for its ease in implementation and lower requirements for computation power. Finally, the authors found the commonly used threshold (0.05) for including variables in stepwise selection process in logistic regression may not result in the best model for prediction performance. The authors strongly suggest that researchers explore beyond this typical threshold in order to find the best performing collection of variables.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom