z-logo
open-access-imgOpen Access
solution for the future: small file management by optimizing Hadoop
Author(s) -
O. Achandair,
S Bourekkadi,
E Elmahouti,
Samira Khoulji,
Mohamed Larbi Kerkeb
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i2.6.10773
Subject(s) - computer science , scalability , distributed file system , block (permutation group theory) , operating system , metadata , component (thermodynamics) , database , file system , computer file , distributed computing , physics , geometry , mathematics , thermodynamics
Hadoop Distributed File System (HDFS) is designed to reliably store very large files across machines in a large cluster. It is one of the most used distributed file systems and offer a high availability and scalability on low-cost hardware. All Hadoopframework have HDFS as their storage component. Coupled with map reduce, which is the processing component, HDFS and Map Reduce (a processing component) have become the standard platforms for any management of big data in these days. HDFS however, in terms of design has the ability to handle huge numbers of large files,  but when it comes to its deployments to handle large amounts of small files it might not be very effective. This paper puts forward a new strategy of managing small files. The approach will consists of two principal phases. The first phase will deal with the consolidating of aaclients input files, storing it continuously in a particular allocated block, that is a SequenceFile format, and so on into the next blocks. In this way we avoid the use of multiple block allocations for different streams, this reduces calls for available blocks and also reduces the metadata memory on the NameNode. Note the reason for this is that groups of small files packaged in a SequenceFile on the same block require one entry instead of one of each small file. The second phase will involve analyzing the attributes of stored small files so they can be distributed them in a way that the most called files will be referenced by an additional index as a MapFile format to reduce the read throughput during random access. 

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here