A Framework for Distributed Cleaning of Data Streams
Author(s) -
Saul Gill,
Brian Lee
Publication year - 2015
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2015.05.156
Subject(s) - computer science , data stream mining , internet of things , streams , quality (philosophy) , data quality , real time computing , data mining , environmental data , the internet , variable (mathematics) , database , embedded system , world wide web , computer network , metric (unit) , philosophy , operations management , mathematical analysis , mathematics , epistemology , political science , law , economics
Vast and ever increasing quantities of data are produced by sensors in the Internet of Things (IoT). The quality of this data can be very variable due to problems with sensors, incorrect calibration etc. Data quality can be greatly enhanced by cleaning the data before it reaches its end user. This paper reports on the construction of a distributed cleaning system (DCS) to clean data streams in real-time for an environmental case-study. A combination of declarative and statistical model based cleaning methods are applied and initial results are reported
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom