
Research on prediction of online purchasing behavior based on hybrid model
Author(s) -
Xi Chen,
Shichou Ding,
Yang Xiang,
Lin Liu
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1827/1/012075
Subject(s) - computer science , decision tree , machine learning , artificial intelligence , data mining , association rule learning , boosting (machine learning) , gradient boosting , classifier (uml) , purchasing , decision tree learning , scalability , logistic regression , random forest , database , engineering , operations management
The research on the potential purchase behavior of users can help merchants develop better marketing strategies. At present, many research methods of online purchasing behavior are based on simple rule prediction, and the prediction results are not satisfactory. We design a hybrid model of Gradient Boosting Decision Tree and logistic regression to accurately predict the purchase behavior of users, which combines the association characteristics between users and commodities. Firstly, clustering algorithm and association rules are used to solve the problem of data imbalance and mine more potential related features. This scheme not only improves the processing efficiency of large data, but also solves the problem of user cold start. Secondly, we construct a scalable tree enhancement system (XGBoost) to train the initial feature set, which is a strong classifier composed of several weak classifiers. A new training set combines the new features with the original features through feature reconstruction, and a hybrid machine learning system is constructed by logistic regression (LR) model. Finally, the LR model is trained by the new training set. Compared with the existing schemes, the integrated decision tree model can train more sample sets with less resources. The experimental results show that the accuracy of the hybrid model is better than single model, and the F1_score is higher.