z-logo
open-access-imgOpen Access
INSTANCE BASED TABLE INTEGRATION ALGORITHM FOR MULTILINGUAL TABLES ON THE WEB
Author(s) -
Daisuke Ikeda
Publication year - 2003
Publication title -
bulletin of informatics and cybernetics
Language(s) - English
Resource type - Journals
eISSN - 2435-743X
pISSN - 0286-522X
DOI - 10.5109/13520
Subject(s) - table (database) , computer science , algorithm , information retrieval , data mining
In this paper, we define the table integration problem which is, given two tables, to determine the correct mapping between fields of the tables. A table is a set of instances of a record which consists of fields. A field is a pair of an attribute name and a sequence of attribute values of the same type. We present an algorithm for the problem which uses only instance values of tables instead of schema and attribute names. Given tables, the algorithm calculates two numerical features for each field using character codes and then finds correspondence between fields among tables. The novelty of the algorithm is that it uses the character code chart for the language in which the contents of the tables are written. This enables a field to be represented by only two types of features. The algorithm requires neither an attribute value contained in all input tables nor attribute names. So, the algorithm is suitable for tables obtained from Web data, as long as they are written in the same language. Applying the algorithm for real Web data written in many languages, we demonstrate that the algorithm yields the accurate results and is robust for errors. The languages are Chinese, English, Germany, Japanese, and Korean.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom