A Comparison of Mining Incomplete and Inconsistent Data | Zendy

Jerzy W. GrzymalaBusse | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Comparison of Mining Incomplete and Inconsistent Data

Author(s) -

Jerzy W. GrzymalaBusse

Publication year - 2017

Publication title -

information technology and control

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.286

H-Index - 19

eISSN - 2335-884X

pISSN - 1392-124X

DOI - 10.5755/j01.itc.46.2.17330

Subject(s) - probabilistic logic , data mining , missing data , word error rate , type i and type ii errors , statistics , computer science , data set , mathematics , artificial intelligence

We present experimental results on a comparison of incompleteness and inconsistency. Our experiments were conducted on 141 data sets, including 71 incomplete data and 62 inconsistent, created from eight original numerical data sets. We used the Modified Learning from Examples Module version 2 (MLEM2) rule induction algorithm for data mining. Among eight types of data sets combined with three kinds of probabilistic approximations used in experiments, in 12 out of 24 combinations the error rate, computed as a result of ten-fold cross validation, was smaller for inconsistent data (two-tailed test, 5 % significance level). For one data set, combined with all three probabilistic approximations, the error rate was smaller for incomplete data. For remaining nine combinations the difference in performance was statistically insignificant. Thus, we may claim that there is some experimental evidence that incompleteness is generally worse than inconsistency for data mining.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research