SIEMÊS – A Named-Entity Recognizer for Portuguese Relying on Similarity Rules
Author(s) -
Luís Sarmento
Publication year - 2006
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-34045-9
DOI - 10.1007/11751984_10
Subject(s) - computer science , artificial intelligence , natural language processing , identification (biology) , set (abstract data type) , portuguese , scope (computer science) , similarity (geometry) , entity linking , information retrieval , knowledge base , linguistics , image (mathematics) , programming language , philosophy , botany , biology
In this paper we describe SIEMÊS, a named-entity recognition system for Portuguese that relies on a set of similarity rules to base the classification procedure. These rules try to obtain soft matches between candidate entities found in text and instances contained in a wide-scope gazetteer, and avoid the need for coding large sets of rules by exploiting lexical similarities. Using this matching procedure, SIEMÊS generates a set of classification hypotheses based solely on internal evidence, which may be disambiguated in a later step by relatively simple rules based on contextual clues. We explain SIEMÊS architecture and its named-entity identification and classification procedure. We also briefly discuss the results of the participation of SIEMÊS in HAREM, the named-entity evaluation contest for Portuguese, and describe future work.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom