z-logo
open-access-imgOpen Access
A Comparison of Mining Incomplete and Inconsistent Data
Author(s) -
Jerzy W. GrzymalaBusse
Publication year - 2017
Publication title -
information technology and control
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.286
H-Index - 19
eISSN - 2335-884X
pISSN - 1392-124X
DOI - 10.5755/j01.itc.46.2.17330
Subject(s) - probabilistic logic , data mining , missing data , word error rate , type i and type ii errors , statistics , computer science , data set , mathematics , artificial intelligence
We present experimental results on a comparison of incompleteness and inconsistency. Our experiments were conducted on 141 data sets, including 71 incomplete data and 62 inconsistent, created from eight original numerical data sets. We used the Modified Learning from Examples Module version 2 (MLEM2) rule induction algorithm for data mining. Among eight types of data sets combined with three kinds of probabilistic approximations used in experiments, in 12 out of 24 combinations the error rate, computed as a result of ten-fold cross validation, was smaller for inconsistent data (two-tailed test, 5 % significance level). For one data set, combined with all three probabilistic approximations, the error rate was smaller for incomplete data. For remaining nine combinations the difference in performance was statistically insignificant. Thus, we may claim that there is some experimental evidence that incompleteness is generally worse than inconsistency for data mining.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom