A nonparametric framework for water consumption data cleansing: an application to a smart water network in Naples (Italy)
Author(s) -
Roberta Padulano,
Giuseppe Del Giudice
Publication year - 2020
Publication title -
journal of hydroinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.654
H-Index - 50
eISSN - 1465-1734
pISSN - 1464-7141
DOI - 10.2166/hydro.2020.133
Subject(s) - nonparametric statistics , data mining , anomaly detection , computer science , homogeneous , missing data , metering mode , reliability (semiconductor) , series (stratigraphy) , consumption (sociology) , field (mathematics) , anomaly (physics) , data collection , database , econometrics , statistics , machine learning , engineering , mathematics , geology , mechanical engineering , paleontology , social science , power (physics) , physics , condensed matter physics , combinatorics , quantum mechanics , sociology , pure mathematics
Remote monitoring and collection of water consumption has gained pivotal importance in the field of demand understanding, modelling and prediction. However, most of the analyses that can be performed on such databases could be jeopardized by inconsistencies due to technological or behavioural issues causing significant amounts of missing or anomalous values. In the present paper, a nonparametric, unsupervised approach is presented to investigate the reliability of a consumption database, applied to the dataset of a district metering area in Naples (Italy) and focused on the detection of suspicious amounts of zero or outlying data. Results showed that the methodology is effective in identifying criticalities both in terms of unreliable time series, namely time series having huge amounts of invalid data, and in terms of unreliable data, namely data values suspiciously different from some suitable central parameters, irrespective of the source causing the anomaly. As such, the proposed approach is suitable for large databases when no prior information is known about the underlying probability distribution of data, and it can also be coupled with other nonparametric, pattern-based methods in order to guarantee that the database to be analysed is homogeneous in terms of water uses.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom