z-logo
open-access-imgOpen Access
Information Extraction Tasks based on BERT and SpaCy on Tourism Domain
Author(s) -
Chantana Chantrapornchai,
Aphisit Tunsakul
Publication year - 2021
Publication title -
ecti transactions on computer and information technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.132
H-Index - 2
ISSN - 2286-9131
DOI - 10.37936/ecti-cit.2021151.228621
Subject(s) - automatic summarization , computer science , information retrieval , information extraction , named entity recognition , relation (database) , relationship extraction , crawling , tourism , domain (mathematical analysis) , sentence , set (abstract data type) , natural language processing , artificial intelligence , world wide web , data mining , task (project management) , medicine , mathematical analysis , programming language , mathematics , management , anatomy , political science , law , economics
In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classication and text summarization. The rst step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built. Several minor steps include sentence extraction, relation and name entity extraction for tagging purpose. These steps are needed for creating proper training data. Then, the recognition model of a given entity type can be built. From the experiments, given review texts, we demonstrate to build the model to extract the desired entity,i.e, name, location, facility as well as relation type, classify the reviews or summarize the reviews. Two tools, SpaCy and BERT, are used to compare the performance of these tasks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here