Premium
Column concept determination based on multiple evidences
Author(s) -
An Xianxi,
You Sihan,
Guo Ziyang,
Lu Zeguang,
Zheng Bo,
Shi Shengfei,
Song Yan
Publication year - 2019
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5457
Subject(s) - column (typography) , computer science , semantics (computer science) , matching (statistics) , identification (biology) , information retrieval , data mining , programming language , mathematics , statistics , telecommunications , botany , frame (networking) , biology
Summary Tables on the web provide rich information. To make sufficient usage of web tables, the semantics of columns should be identified correctly. The absence, misspelling, and abbreviation in column names bring the challenges in column semantics identification. Facing this challenge, we extract multiple features including keywords, concepts, and structure from the content in the column. Thus, we could identify the column semantics by matching these multiple features. For the extraction and matching with these features, we propose efficient algorithms. Experimental results on real data sets show that our solution achieves high performance.