
Prediction of Missing Data in Rainfall Dataset by using Simple Statistical Method
Author(s) -
Izzati Amani Mohd Jafri,
Norazian Mohamed Noor,
Ahmad Zia Ul-Saufie,
Suwardi Annas
Publication year - 2020
Publication title -
iop conference series. earth and environmental science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.179
H-Index - 26
eISSN - 1755-1307
pISSN - 1755-1315
DOI - 10.1088/1755-1315/616/1/012005
Subject(s) - missing data , imputation (statistics) , statistics , mean squared error , statistic , computer science , data mining , interpolation (computer graphics) , mathematics , artificial intelligence , motion (physics)
Almost all of the data obtained from hydrological station contains missing data. Usually, this problem occurs due to equipment failures, maintenance work and human error. Incomplete dataset will reduce the ability of a statistical analysis and can cause a bias estimation due to systematic differences between observed and unobserved data. In this study, four simple statistical method such as Series Mean, Average Mean Top Bottom, Linear Interpolation and Nearest Neighbour were applied to predict the missing values in a rainfall dataset. An annual daily data for rainfall from nine selected monitoring station (from 2009 until 2018) were described using descriptive statistic. Then, the dataset were randomly simulated into 4 percentages of missing (5%, 10%, 15% and 20%) by using statistical package for social sciences software. The performance of this imputation methods were evaluated by using four performance indicators namely Mean Absolute Error, Root Mean Squared Error, Prediction Accuracy, and Index of Agreement. Overall, Linear Interpolation method was selected as the best imputation method to predict the missing data in the rainfall dataset.