z-logo
open-access-imgOpen Access
Information Retrieval based on Cluster Analysis Approach
Author(s) -
Orabe Almanaseer
Publication year - 2021
Publication title -
international journal of computer science and information technology/international journal of computer science and information technology (chennai. print)
Language(s) - English
Resource type - Journals
eISSN - 0975-4660
pISSN - 0975-3826
DOI - 10.5121/ijcsit.2021.13502
Subject(s) - computer science , information retrieval , cluster analysis , vector space model , set (abstract data type) , preprocessor , precision and recall , process (computing) , data mining , result set , rank (graph theory) , artificial intelligence , mathematics , combinatorics , programming language , operating system
The huge volume of text documents available on the internet has made it difficult to find valuable information for specific users. In fact, the need for efficient applications to extract interested knowledge from textual documents is vitally important. This paper addresses the problem of responding to user queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a cluster-based information retrieval framework was proposed in this paper, in order to design and develop a system for analysing and extracting useful patterns from text documents. In this approach, a preprocessing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector Space Model (VSM) is performed to represent the dataset. The system was implemented through two main phases. In phase 1, the clustering analysis process is designed and implemented to group documents into several clusters, while in phase 2, an information retrieval process was implemented to rank clusters according to the user queries in order to retrieve the relevant documents from specific clusters deemed relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision (P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here