Automatic Extraction of Property Norm‐Like Data From Large Text Corpora | Zendy

Kelly Colin | Zendy; Devereux Barry | Zendy; Korhonen Anna | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Automatic Extraction of Property Norm‐Like Data From Large Text Corpora

Author(s) -

Kelly Colin,

Devereux Barry,

Korhonen Anna

Publication year - 2013

Publication title -

cognitive science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.498

H-Index - 114

eISSN - 1551-6709

pISSN - 0364-0213

DOI - 10.1111/cogs.12091

Subject(s) - wordnet , computer science , natural language processing , property (philosophy) , artificial intelligence , semantic similarity , parsing , norm (philosophy) , similarity (geometry) , relation (database) , information retrieval , data mining , philosophy , epistemology , political science , law , image (mathematics)

Traditional methods for deriving property‐based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is‐a vehicle ) or meronymy/metonymy (e.g., car has wheels ), or unspecified relations (e.g., car — petrol ). We propose a system for the challenging task of automatic, large‐scale acquisition of unconstrained, human‐like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept‐relation‐feature triples (e.g., car be fast , car require petrol , car cause pollution ), which approximate property‐based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human‐generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human‐judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state‐of‐the‐art, while subsequent evaluations exhibit the human‐like character of our generated properties.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research