solution for the future: small file management by optimizing Hadoop | Zendy

O. Achandair | Zendy; S Bourekkadi | Zendy; E Elmahouti | Zendy; Samira Khoulji | Zendy; Mohamed Larbi Kerkeb | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

solution for the future: small file management by optimizing Hadoop

Author(s) -

O. Achandair,

S Bourekkadi,

E Elmahouti,

Samira Khoulji,

Mohamed Larbi Kerkeb

Publication year - 2018

Publication title -

international journal of engineering and technology

Language(s) - English

Resource type - Journals

ISSN - 2227-524X

DOI - 10.14419/ijet.v7i2.6.10773

Subject(s) - computer science , scalability , distributed file system , block (permutation group theory) , operating system , metadata , component (thermodynamics) , database , file system , computer file , distributed computing , physics , geometry , mathematics , thermodynamics

Hadoop Distributed File System (HDFS) is designed to reliably store very large files across machines in a large cluster. It is one of the most used distributed file systems and offer a high availability and scalability on low-cost hardware. All Hadoopframework have HDFS as their storage component. Coupled with map reduce, which is the processing component, HDFS and Map Reduce (a processing component) have become the standard platforms for any management of big data in these days. HDFS however, in terms of design has the ability to handle huge numbers of large files, but when it comes to its deployments to handle large amounts of small files it might not be very effective. This paper puts forward a new strategy of managing small files. The approach will consists of two principal phases. The first phase will deal with the consolidating of aaclients input files, storing it continuously in a particular allocated block, that is a SequenceFile format, and so on into the next blocks. In this way we avoid the use of multiple block allocations for different streams, this reduces calls for available blocks and also reduces the metadata memory on the NameNode. Note the reason for this is that groups of small files packaged in a SequenceFile on the same block require one entry instead of one of each small file. The second phase will involve analyzing the attributes of stored small files so they can be distributed them in a way that the most called files will be referenced by an additional index as a MapFile format to reduce the read throughput during random access.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore