
An energy-aware data cleaning workflow for real-time stream processing in the internet of things
Author(s) -
Egberto A. R. de Oliveira,
Flávia C. Delicato,
Marta Lima de Queirós Mattoso
Publication year - 2020
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/courb.2020.12354
Subject(s) - computer science , stream processing , workflow , edge computing , edge device , data stream mining , wireless sensor network , data processing , enhanced data rates for gsm evolution , energy consumption , raw data , the internet , software deployment , data stream , real time computing , distributed computing , computer network , database , internet of things , embedded system , data mining , engineering , cloud computing , telecommunications , world wide web , operating system , programming language , electrical engineering
The Internet of things (IoT) has recently transformed the internet, enabling the communication between every kind of objects (things). The growing number of sensors and smart devices enhanced data creation and collection capabilities and led to an explosion of generated data in the form of Data Streams. Processing these data streams is complex and presents challenges and opportunities in the stream processing field. Due to the inherent lacking of accuracy and completeness of sensor generated data, the quality of raw data is often poor. Data cleaning tasks are required to help increasing the quality of the data being processed in an IoT application. This work proposes a data stream processing workflow for IoT to be deployed at the edge of the network. It performs a fast data cleaning with low power consumption from edge and sensor nodes. The edge computing paradigm is used to bring the data cleaning task closer to the data sources and allow actions to be triggered immediately. In addition, an energy-aware data collection component is designed to reduce the network traffic and, as a consequence, decrease the power consumption of the network devices. The proposed workflow enables the deployment of long running real-time processing systems on remote outdoor environments.