Improving database quality through eliminating duplicate records | Zendy

Mingzhen Wei | Zendy; Andrew H. Sung | Zendy; Martha Cather | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Improving database quality through eliminating duplicate records

Author(s) -

Mingzhen Wei,

Andrew H. Sung,

Martha Cather

Publication year - 2006

Publication title -

data science journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.358

H-Index - 21

ISSN - 1683-1470

DOI - 10.2481/dsj.5.127

Subject(s) - computer science , scope (computer science) , usability , transparency (behavior) , metadata , data science , open data , world wide web , implementation , data publishing , reuse , publishing , database , software engineering , political science , computer security , engineering , human–computer interaction , law , programming language , waste management

Redundant or duplicate data are the most troublesome problem in database management and applications. Approximate field matching is the key solution to resolve the problem by identifying semantically equivalent string values in syntactically different representations. This paper considers token-based solutions and proposes a general field matching framework to generalize the field matching problem in different domains. By introducing a concept of String Matching Points (SMP) in string comparison, string matching accuracy and efficiency are improved, compared with other commonly-applied field matching algorithms. The paper discusses the development of field matching algorithms from the developed general framework. The framework and corresponding algorithm are tested on a public data set of the NASA publication abstract database. The approach can be applied to address the similar problems in other databases

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research