z-logo
open-access-imgOpen Access
Self-Tuned Descriptive Document Clustering using a Predictive Network
Author(s) -
K. Syed Kousar Niasi,
P. Sidheshwari
Publication year - 2019
Publication title -
international journal of scientific research in science, engineering and technology
Language(s) - English
Resource type - Journals
eISSN - 2395-1990
pISSN - 2394-4099
DOI - 10.32628/ijsrset21841135
Subject(s) - computer science , automatic summarization , information retrieval , ranking (information retrieval) , document clustering , search engine indexing , tf–idf , rank (graph theory) , cluster analysis , topic model , key (lock) , subject (documents) , data mining , world wide web , term (time) , artificial intelligence , physics , mathematics , computer security , quantum mechanics , combinatorics
Document network is defined as a collection of documents that are connected by links. Document clustering become ubiquitous nowadays due to the widespread use of online databases, such as academic search engines. Topic modeling has become a widely used tool for document management because of its superior performance. However, there are few topic models differentiate the importance of documents on different topics. In this survey, can implement text rank algorithms of documents to improve topic modeling and propose to incorporate link based ranking into topic modeling. Text summarization provides an important role in information retrieval. Snippets generated by web search engines for every query result is an application of text summarization. Existing text summarization techniques shows that the indexing is done on the basis of the words present in the document and consists of an array of the posting lists. Document features such as term frequency, text length are used to allocate indexing weight to words. Specifically, topical rank is used to compute the subject stage rating of files, which indicates the significance of documents on special topics. By taking flight the topical ranking of a file as the opportunity of the record concerned in corresponding subject matter, a generalized relation is created between ranking and subject matter modeling. In this thesis, can implement topic discovery model for large number of medical database. The datasets are trained and extract the key terms based text mining and fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can maintain health & medical corpora redundancy problem and provides a new method to estimate the number of topics.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here