z-logo
open-access-imgOpen Access
Entity perception of Two-Step-Matching framework for public opinions
Author(s) -
Rende Li,
Hao-Tian Ma,
Ziyi Wang,
Qiang Guo,
Jian-Guo Liu
Publication year - 2020
Publication title -
journal of safety science and resilience
Language(s) - English
Resource type - Journals
eISSN - 2096-7527
pISSN - 2666-4496
DOI - 10.1016/j.jnlssr.2020.06.005
Subject(s) - matching (statistics) , directory , computer science , perception , process (computing) , identification (biology) , information retrieval , data mining , artificial intelligence , mathematics , statistics , psychology , botany , biology , operating system , neuroscience
Highlights • A Two-Step-Matching method is designed to identify the precise target of Chinese entity from ambiguous user comments of public opinions. • BiLSTM-CRF model is used to extract potential entity and TF-IDF model is used to extract characteristic words from user comments. • Jaro-Winkler distance algorithm is used in the first matching process, where a business directory is built according to entity registration details. • In the second matching process, an industry-characteristic dictionary is introduced to identify precise target entity if ambiguity exists. • Associated rate and accuracy rate are used to evaluate the effect of the method. And two indicators are improved remarkably. Abstract Entity perception of ambiguous user comments is a critical problem of target identification for huge amount of public opinions. In this paper, a Two-Step-Matching method is proposed to identify the precise target entity from multiple entities mentioned. Firstly, potential entities are extracted by BiLSTM-CRF model and characteristic words by TF-IDF model from public comments. Secondly, the first matching is implemented between potential entities and an official business directory by Jaro-Winkler distance algorithm. Then, in order to find the precise one, an industry-characteristic dictionary is developed into the second matching process. The precise entity is identified according to the count of characteristic words matching to industry-characteristic dictionary. In addition, associated rate (global indicator) and accuracy rate (sample indicator) are defined for evaluation of matching accuracy. The results for three data sets of public opinions about major public health events show that the highest associated rate and accuracy rate arrive at 0.93 and 0.95, averagely enhanced by 32% and 30% above the case of using the first matching process alone. This framework provides the method to find the true target entity of really wanted expression from public opinions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom