z-logo
Premium
Big data and machine learning framework for clouds and its usage for text classification
Author(s) -
Pintye István,
Kail Eszter,
Kacsuk Péter,
Lovas Róbert
Publication year - 2020
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.6164
Subject(s) - software deployment , computer science , cloud computing , scalability , spark (programming language) , big data , usability , data science , computer cluster , software engineering , distributed computing , world wide web , database , operating system , programming language
Reference architectures for big data and machine learning include not only interconnected building blocks but important considerations (among others) for scalability, manageability and usability issues as well. Leveraging on such reference architectures, the automated deployment of distributed toolsets and frameworks on various clouds is still challenging due to the diversity of technologies and protocols. The paper focuses particularly on the widespread Apache Spark cluster with Jupyter as the particularly addressed framework, and the Occopus cloud‐agnostic orchestrator tool for automating its deployment and maintenance stages. The presented approach has been demonstrated and validated with a new, promising text classification application on the Hungarian academic research infrastructure, the OpenStack‐based MTA Cloud. The paper explains the concept, the applied components, and illustrates their usage with real use‐case measurements.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here