
Entity Matching of Shop Accounts in Online Commerce Portals
Author(s) -
Dina Salsabila,
Takdir Takdir
Publication year - 2022
Publication title -
proceedings of international conference on data science and official statistics
Language(s) - English
Resource type - Journals
ISSN - 2809-9842
DOI - 10.34123/icdsos.v2021i1.71
Subject(s) - computer science , matching (statistics) , naive bayes classifier , identification (biology) , object (grammar) , similarity (geometry) , word (group theory) , core (optical fiber) , information retrieval , value (mathematics) , precision and recall , data mining , world wide web , data science , artificial intelligence , machine learning , image (mathematics) , statistics , support vector machine , mathematics , telecommunications , botany , geometry , biology
Currently, online marketplace data are valuable data sources to be analyzed forvarious purposes. In the data collecting phases, duplication of shop accounts was found, resulting in biased analysis. This study examines the development of a mechanism to identify duplicate entities, i.e. store accounts, between different online marketplaces, or commonly known as entity matching. Word similarity algorithms were adopted as the core elements of our approach. Additionally, we present an entity matching model by examining logisticregression, naive Bayes, and random forest to find the best model for classifying store account similarities. Top online marketplaces in Indonesia are the object of our study, limited to one developing municipality, i.e. Sleman, DI Yogyakarta. The results show the best model has an accuracy value of 0.961, precision of 0.963, a recall of 0.958, and an F1-score of 0.962. Therefore, these results are acceptable for duplicate identification.