
Candidate Feature Extraction and Categorization for Unstructured Text Document
Author(s) -
P. P. Shelke,
Aditya A Pardeshi
Publication year - 2020
Publication title -
international journal of scientific research in computer science, engineering and information technology
Language(s) - English
Resource type - Journals
ISSN - 2456-3307
DOI - 10.32628/cseit20639
Subject(s) - computer science , artificial intelligence , categorization , feature (linguistics) , natural language processing , tree (set theory) , phrase , feature extraction , noun phrase , sentence , parsing , parse tree , process (computing) , text categorization , pattern recognition (psychology) , linguistics , noun , mathematics , mathematical analysis , philosophy , operating system
In the phrases words contains crucial information which helps in feature extraction process. The established techniques for such has huge problem and has limitations in feature extraction process and also it ignores the grammatical structure for the phrases. So results as poor features get extracted. So to overcome this problem a system is proposed which is based on generation of parse tree for the input sentence and cut down into sub-tree subsequently. The branches of the tree are extracted using part-of-speech (POS) labelling intended for candidate phrase. To stay away from redundant phrases filtering is recommended. Finally machine learning is used for the Feature categorization progression. The result illustrates the effectiveness of the approach.