z-logo
Premium
SC‐OCR: similarity‐based clustering and optimum cache replacement approach
Author(s) -
Malli Subramanian Sabitha,
Soundarajan Vijayalakshmi
Publication year - 2016
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3916
Subject(s) - computer science , cache , big data , spark (programming language) , cluster analysis , cloud computing , data mining , process (computing) , database , parallel computing , machine learning , operating system , programming language
Summary Big data is a new term used to identify the large scale and complex datasets. Big data is now rapidly expanding in all science and engineering domains, owing to the fast development of networking, data storage, and data collection capacity. Big data mining is the capability of extracting useful information from these large datasets. Nowadays, the integration of cloud computing with the data mining for the big data mining process is a challenging task. In order to process the huge amount of data, it is necessary to concentrate the improvement on the big data computation. Most of the existing approaches use the MapReduce to compute the big data. The increase in the computational cost and memory consumption are the main drawbacks of the existing approaches. To overcome these limitations, this paper proposes a similarity‐based clustering and optimum cache replacement approach for big data computing applications. The job recovery process is initiated by copying the data in the cloud server and forwarding the data copy for further processing. Then, the job is divided into clusters based on the similarity‐based clustering approach. Finally, the cache concept is introduced with the optimum cache replacement algorithm to avoid repeated execution of the jobs by queue management. The proposed approach is compared with the existing Spark and Hadoop approaches. The proposed approach achieves better performance in terms of iteration time, query response time, job completion time, and clustering accuracy. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here