z-logo
open-access-imgOpen Access
Use of the university’s enrolment campaign database for the development of a computer model to predict student expulsion
Author(s) -
A. V. Zharikov,
Е. В. Журавлев,
Oleg Zhurenkov,
D. Yu Kozlov
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1615/1/012014
Subject(s) - computer science , scripting language , data set , set (abstract data type) , database , test data , sql , test (biology) , data mining , process (computing) , programming language , artificial intelligence , paleontology , biology
The article discusses the construction of a computer model to predict the problems occurrence in students in the educational process at the university. The following data sources of Altai State University were used for this purpose: “Admissions Office” (enrollees database) and “Dean’s office” (database of students) for 2013-2018. These data were combined using developed SQL scripts. While analyzing the obtained combined data set, we had to face the difficulties typical for solving data analysis problems. Thus, it turned out that there are incomplete and inconsistent data or cases when one and the same entity is named differently, etc. In order to solve these problems, we wrote a script in the R programming language using regular expressions, the data were unified and standardized, and the missing data were restored using the information from other fields of the data set. Then we discarded the variables with the near-zero dispersion, which could not make a significant contribution to the developed predictive model. After that, the data set under study was divided into 2 parts: the 2013-2017 data were taken to build a predictive model with the use of the logistic regression algorithm, while the data for 2018 were used, in fact, to predict whether a particular student would be expelled. It should be mentioned that the 2013-2017 data were divided into the training and test samples in the proportion of 90% and 10% correspondingly. The test result of the computer model built in the R programming language showed satisfactory accuracy; the most significant factors affecting student expulsion were also identified. The paper substantiates the economic feasibility of using the developed computer model at the university.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here