Text classification using document-document semantic similarity | Zendy

Indrajit Mukherjee | Zendy; Prabhat Mahanti | Zendy; Vandana Bhattacharya | Zendy; Samudra Banerjee | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Text classification using document-document semantic similarity

Author(s) -

Indrajit Mukherjee,

Prabhat Mahanti,

Vandana Bhattacharya,

Samudra Banerjee

Publication year - 2013

Publication title -

international journal of web science

Language(s) - English

Resource type - Journals

eISSN - 1757-8809

pISSN - 1757-8795

DOI - 10.1504/ijws.2013.056572

Subject(s) - information retrieval , document classification , computer science , semantic similarity , similarity (geometry) , natural language processing , document clustering , artificial intelligence , cluster analysis , image (mathematics)

One of the key problems encountered while using a text classification learning algorithms is that they require huge amount of labelled examples to learn accurately. The objective of this paper is to propose a novel method of topic modelling and document-document semantic similarity algorithm (DDSSA), which reduces the need for larger training data. This algorithm finds the concepts and keywords of the unlabelled text, identifying the topic of unlabelled text from list of concepts and keywords obtained from labelled text. This can be achieved by obtaining the concepts of the labelled text and identify the keywords which holds strong relationships with given labelled data. This topics and keywords obtained from the labelled text can be stored in the database which in turn can be used to compute the semantic similarity with concepts obtained from the unlabelled text. The proposed method is compared with the popular latent semantic analysis (LSA) applied in NLTK and Mallet datasets. The experiment result show...

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research