
Machine learning-based e-commerce platform repurchase customer prediction model
Author(s) -
Cheng-Ju Liu,
Tienshou Huang,
Ping-Tsan Ho,
Jui-Chan Huang,
Ching-Tang Hsieh
Publication year - 2020
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0243105
Subject(s) - computer science , decision tree , machine learning , artificial intelligence , decision tree model , data mining , fuse (electrical) , predictive modelling , support vector machine , logistic regression , decision tree learning , tree (set theory) , scale (ratio) , mathematics , engineering , mathematical analysis , electrical engineering , physics , quantum mechanics
In recent years, China's e-commerce industry has developed at a high speed, and the scale of various industries has continued to expand. Service-oriented enterprises such as e-commerce transactions and information technology came into being. This paper analyzes the shortcomings and challenges of traditional online shopping behavior prediction methods, and proposes an online shopping behavior analysis and prediction system. The paper chooses linear model logistic regression and decision tree based XGBoost model. After optimizing the model, it is found that the nonlinear model can make better use of these features and get better prediction results. In this paper, we first combine the single model, and then use the model fusion algorithm to fuse the prediction results of the single model. The purpose is to avoid the accuracy of the linear model easy to fit and the decision tree model over-fitting. The results show that the model constructed by the article has further improvement than the single model. Finally, through two sets of contrast experiments, it is proved that the algorithm selected in this paper can effectively filter the features, which simplifies the complexity of the model to a certain extent and improves the classification accuracy of machine learning. The XGBoost hybrid model based on p/n samples is simpler than a single model. Machine learning models are not easily over-fitting and therefore more robust.