Mim: A Merge Iteration and Its Applications for Big Data | Zendy

Jie Song | Zendy; Han Wang | Zendy; Yichuan Zhang | Zendy; Yubing Bao | Zendy; Ge Yu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Mim: A Merge Iteration and Its Applications for Big Data

Author(s) -

Jie Song,

Han Wang,

Yichuan Zhang,

Yubing Bao,

Ge Yu

Publication year - 2018

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2018.2879779

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

With the rapid development of technologies like the Internet, sensors and bioinformatics, and data has grown explosively. In the big data era, more and more iterative algorithms have been applied in the fields of data mining and machine learning. In most situation, the iterative algorithms compute in the entire dataset which are merged from the partial ones. Given the iterative results on partial datasets, it is efficient if the results on the entire dataset can be merged from them, otherwise the re-computing on entire one is time consuming. Unfortunately, current iteration model do not support the results merging. We propose merge iteration computing model (Mim) in this paper. Mim is a solution but not a platform. It states how to execute iterative algorithm effectively through reusing the exiting results without sacrificing the accuracy, and such mechanism is suitable for the most iterative algorithms. We explain the in-partition iteration step, error evaluation step, compensation step (optional), and merge iteration step of Mim, in addition, the in-partition iteration step is preliminary of merging iteration and should be done before the partial datasets are merged. We also analyze the accuracy and performance advantages of Mim theoretically. In the application scenarios, we implement Mim over Spark framework, and applied the Mim to the financial data analysis in a city. Finally, through a series of experiments, we prove the efficiency and accuracy of the proposed Mim on the PageRank and K-means algorithms. Under the various test cases, the maximum optimization ratio of Mim is 25% and 56% comparing with regular iteration on PageRank and K-means, respectively. And the errors are negligible.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research