
Implementing Machine Learning Algorithms to Predict Donor Status: Preliminary Work with Data from an Institution of Higher Learning
Author(s) -
Cecilia Coulter,
Paula Baingana,
Pascaline Mukakamari
Publication year - 2020
Publication title -
actas del congreso internacional de ingeniería de sistemas
Language(s) - English
Resource type - Conference proceedings
ISSN - 2810-806X
DOI - 10.26439/ciis2019.5527
Subject(s) - machine learning , undersampling , computer science , artificial intelligence , metric (unit) , resampling , statistical classification , random forest , receiver operating characteristic , algorithm , predictive power , data mining , engineering , philosophy , operations management , epistemology
Identifying potential donors allows institutions of higher learning to conduct more effective fundraising campaigns. Machine learning classification algorithms can be useful in building models to predict donor status. However, when data contains imbalanced classes, like the data we used for this project, models tend to over-index the majority class, which was non-donors in this case. These results have significant implications for institutions in that they may not pursue entities that may, in fact, become donors. In order to improve the usefulness of our model, we used a resampling technique called random undersampling (RUS) to balance the data and also the area under the receiver operating characteristic curve (AUC-ROC) metric to evaluate the performance. Our final model improved its predictive power from 67% to 76%. Institutions of higher learning can use this machine learning model to more efficiently target the pool of potential donors, saving money and time. Future research will focus on improving the predictive accuracy of our model by exploring other data manipulation techniques that minimize the effect of imbalanced data, changing thresholds for classification algorithms, and using genetic programming and feature engineering.