Abstracting for Dimensionality Reduction in Text Classification | Zendy

McAllister Richard A. | Zendy; Angryk Rafal A. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Abstracting for Dimensionality Reduction in Text Classification

Author(s) -

McAllister Richard A.,

Angryk Rafal A.

Publication year - 2013

Publication title -

international journal of intelligent systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.291

H-Index - 87

eISSN - 1098-111X

pISSN - 0884-8173

DOI - 10.1002/int.21543

Subject(s) - computer science , abstraction , ontology , natural language processing , task (project management) , artificial intelligence , information retrieval , scalability , dimensionality reduction , process (computing) , latent semantic analysis , word (group theory) , curse of dimensionality , database , programming language , philosophy , linguistics , management , epistemology , economics

There is a growing interest in efficient models of text mining and an emergent need for new data structures that address word relationships. Detailed knowledge about the taxonomic environment of keywords that are used in text documents can provide valuable insight into the nature of the subject matter contained therein. Such insight may be used to enhance the data structures used in the text data mining task as relationships become usefully apparent. A popular scalable technique used to infer these relationships, while reducing dimensionality, has been Latent Semantic Analysis. We present a new approach, which uses an ontology of lexical abstractions to create abstraction profiles of documents and uses these profiles to perform text organization based on a process that we call frequent abstraction analysis. We introduce TATOO, the Text Abstraction TOOlkit, which is a full implementation of this new approach. We present our data model via an example of how taxonomically derived abstractions can be used to supplement semantic data structures for the text classification task.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research