Auto-Prep: Efficient and Automated Data Preprocessing Pipeline | Zendy

Mehwish Bilal | Zendy; Ghulam Ali | Zendy; Muhammad Waseem Iqbal | Zendy; Muhammad Anwar | Zendy; Muhammad Sheraz Arshad Malik | Zendy; Rabiah Abdul Kadir | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Auto-Prep: Efficient and Automated Data Preprocessing Pipeline

Author(s) -

Mehwish Bilal,

Ghulam Ali,

Muhammad Waseem Iqbal,

Muhammad Anwar,

Muhammad Sheraz Arshad Malik,

Rabiah Abdul Kadir

Publication year - 2022

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2022.3198662

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Data preprocessing is crucial in the Machine Learning pipeline because the models’ learning ability directly affects the quality of data and the underlying information acquired from this stage. Nevertheless, surprisingly, there are many alternatives for each transformation task, which makes an inexperienced user overwhelmed. A simple Python-based Auto-preprocessing architecture for Automated Machine Learning is developed to offer automated, interactive, and data-driven support to help the users perform data preprocessing tasks efficiently. The suggested method provides valuable insights into a dataset and can handle standard data preprocessing tasks adeptly. Initially, it detects the data problem and presents it to the end-user using compelling visualizations. Then, it recommends the most effective data cleaning and preparation method to the user after evaluating the state-of-the-art candidate techniques. For evaluation, the proposed architecture is employed on ten different and diverse datasets for automatic data preprocessing before passing it to an ML algorithm. The results are then compared with the results generated by the same ML algorithm but implemented on manually preprocessed data. The results have shown that not only did this approach make the whole process uncomplicated and facile, but it was also able to improve the performance of the model significantly.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research