z-logo
open-access-imgOpen Access
Effective processing of unstructured data using python in Hadoop map reduce
Author(s) -
K. Kousalya,
Shaik Javed Parvez
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i2.21.12456
Subject(s) - python (programming language) , computer science , java , map reduce , reducer , unstructured data , open source , operating system , database , parallel computing , big data , software , civil engineering , engineering
In present scenario, the growing data are naturally unstructured. In this case to handle the wide range of data, is difficult. The proposed paper is to process the unstructured text data effectively in Hadoop map reduce using Python. Apache Hadoop is an open source platform and it widely uses Map Reduce framework. Map Reduce is popular and effective for processing the unstructured data in parallel manner.  There are two stages in map reduce, namely transform and repository. Here the input splits into small blocks and worker node process individual blocks in parallel. This map reduce generally is based on java. While Hadoop Streaming allows writing mapper and reducer in other languages like Python. In this paper, we are going to show an alternative way of processing the growing unstructured content data by using python. We will also compare the performance between java based and non-java based programs. 

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here