Feature Generation of Dictionary for Named-Entity Recognition based on Machine Learning
Author(s) -
JaeHoon Kim,
Hyung-Chul Kim,
Yunsoo Choi
Publication year - 2010
Publication title -
journal of information management
Language(s) - English
Resource type - Journals
ISSN - 0254-3621
DOI - 10.1633/jim.2010.41.2.031
Subject(s) - named entity recognition , computer science , wordnet , artificial intelligence , natural language processing , feature (linguistics) , phrase , named entity , information extraction , scheme (mathematics) , word (group theory) , information retrieval , linguistics , mathematical analysis , philosophy , mathematics , management , economics , task (project management)
Now named-entity recognition(NER) as a part of information extraction has been used in the fields of information retrieval as well as question-answering systems. Unlike words, named-entities(NEs) are generated and changed steadily in documents on the Web, newspapers, and so on. The NE generation causes an unknown word problem and makes many application systems with NER difficult. In order to alleviate this problem, this paper proposes a new feature generation method for machine learning-based NER. In general features in machine learning-based NER are related with words, but entities in named-entity dictionaries are related to phrases. So the entities are not able to be directly used as features of the NER systems. This paper proposes an encoding scheme as a feature generation method which converts phrase entities into features of word units. Futhermore, due to this scheme, entities with semantic information in WordNet can be converted into features of the NER systems. Through our experiments we have shown that the performance is increased by about 6% of F1 score and the errors is reduced by about 38%.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom