z-logo
Premium
Automatic Extraction of Property Norm‐Like Data From Large Text Corpora
Author(s) -
Kelly Colin,
Devereux Barry,
Korhonen Anna
Publication year - 2013
Publication title -
cognitive science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.498
H-Index - 114
eISSN - 1551-6709
pISSN - 0364-0213
DOI - 10.1111/cogs.12091
Subject(s) - wordnet , computer science , natural language processing , property (philosophy) , artificial intelligence , semantic similarity , parsing , norm (philosophy) , similarity (geometry) , relation (database) , information retrieval , data mining , philosophy , epistemology , political science , law , image (mathematics)
Traditional methods for deriving property‐based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is‐a vehicle ) or meronymy/metonymy (e.g., car has wheels ), or unspecified relations (e.g., car — petrol ). We propose a system for the challenging task of automatic, large‐scale acquisition of unconstrained, human‐like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept‐relation‐feature triples (e.g., car be fast , car require petrol , car cause pollution ), which approximate property‐based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human‐generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human‐judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state‐of‐the‐art, while subsequent evaluations exhibit the human‐like character of our generated properties.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here