Effective processing of unstructured data using python in Hadoop map reduce | Zendy

K. Kousalya | Zendy; Shaik Javed Parvez | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Effective processing of unstructured data using python in Hadoop map reduce

Author(s) -

K. Kousalya,

Shaik Javed Parvez

Publication year - 2018

Publication title -

international journal of engineering and technology

Language(s) - English

Resource type - Journals

ISSN - 2227-524X

DOI - 10.14419/ijet.v7i2.21.12456

Subject(s) - python (programming language) , computer science , java , map reduce , reducer , unstructured data , open source , operating system , database , parallel computing , big data , software , civil engineering , engineering

In present scenario, the growing data are naturally unstructured. In this case to handle the wide range of data, is difficult. The proposed paper is to process the unstructured text data effectively in Hadoop map reduce using Python. Apache Hadoop is an open source platform and it widely uses Map Reduce framework. Map Reduce is popular and effective for processing the unstructured data in parallel manner. There are two stages in map reduce, namely transform and repository. Here the input splits into small blocks and worker node process individual blocks in parallel. This map reduce generally is based on java. While Hadoop Streaming allows writing mapper and reducer in other languages like Python. In this paper, we are going to show an alternative way of processing the growing unstructured content data by using python. We will also compare the performance between java based and non-java based programs.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore