Mim: A Merge Iteration and Its Applications for Big Data
Author(s) -
Jie Song,
Han Wang,
Yichuan Zhang,
Yubing Bao,
Ge Yu
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2879779
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
With the rapid development of technologies like the Internet, sensors and bioinformatics, and data has grown explosively. In the big data era, more and more iterative algorithms have been applied in the fields of data mining and machine learning. In most situation, the iterative algorithms compute in the entire dataset which are merged from the partial ones. Given the iterative results on partial datasets, it is efficient if the results on the entire dataset can be merged from them, otherwise the re-computing on entire one is time consuming. Unfortunately, current iteration model do not support the results merging. We propose merge iteration computing model (Mim) in this paper. Mim is a solution but not a platform. It states how to execute iterative algorithm effectively through reusing the exiting results without sacrificing the accuracy, and such mechanism is suitable for the most iterative algorithms. We explain the in-partition iteration step, error evaluation step, compensation step (optional), and merge iteration step of Mim, in addition, the in-partition iteration step is preliminary of merging iteration and should be done before the partial datasets are merged. We also analyze the accuracy and performance advantages of Mim theoretically. In the application scenarios, we implement Mim over Spark framework, and applied the Mim to the financial data analysis in a city. Finally, through a series of experiments, we prove the efficiency and accuracy of the proposed Mim on the PageRank and K-means algorithms. Under the various test cases, the maximum optimization ratio of Mim is 25% and 56% comparing with regular iteration on PageRank and K-means, respectively. And the errors are negligible.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom