SC‐OCR: similarity‐based clustering and optimum cache replacement approach | Zendy

Malli Subramanian Sabitha | Zendy; Soundarajan Vijayalakshmi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

SC‐OCR: similarity‐based clustering and optimum cache replacement approach

Author(s) -

Malli Subramanian Sabitha,

Soundarajan Vijayalakshmi

Publication year - 2016

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3916

Subject(s) - computer science , cache , big data , spark (programming language) , cluster analysis , cloud computing , data mining , process (computing) , database , parallel computing , machine learning , operating system , programming language

Summary Big data is a new term used to identify the large scale and complex datasets. Big data is now rapidly expanding in all science and engineering domains, owing to the fast development of networking, data storage, and data collection capacity. Big data mining is the capability of extracting useful information from these large datasets. Nowadays, the integration of cloud computing with the data mining for the big data mining process is a challenging task. In order to process the huge amount of data, it is necessary to concentrate the improvement on the big data computation. Most of the existing approaches use the MapReduce to compute the big data. The increase in the computational cost and memory consumption are the main drawbacks of the existing approaches. To overcome these limitations, this paper proposes a similarity‐based clustering and optimum cache replacement approach for big data computing applications. The job recovery process is initiated by copying the data in the cloud server and forwarding the data copy for further processing. Then, the job is divided into clusters based on the similarity‐based clustering approach. Finally, the cache concept is introduced with the optimum cache replacement algorithm to avoid repeated execution of the jobs by queue management. The proposed approach is compared with the existing Spark and Hadoop approaches. The proposed approach achieves better performance in terms of iteration time, query response time, job completion time, and clustering accuracy. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research