Post processing wrapper generated tables for labeling anonymous datasets
Author(s) -
Emdad Ahmed,
Hasan M. Jamil
Publication year - 2009
Publication title -
digitalcommons - waynestate (wayne state university)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/1651587.1651602
Subject(s) - computer science , information retrieval , schema (genetic algorithms) , context (archaeology) , data mining , machine learning , artificial intelligence , biology , paleontology
A large number of wrappers generate tables without column names for human consumption because the meaning of the columns are apparent from the context and easy for humans to understand, but in emerging applications, labels are needed for autonomous assignment and schema mapping where machine try to understand the tables. Autonomous label assignment is critical in volume data processing where ad hoc mediation, extraction and querying is involved. We propose an algorithm Lads for Labeling Anonymous Datasets, which can holistically label tabular web document. The algorithm has been tested on anonymous datasets from a number of sites, e.g music, movie, political, demographic, athletic obtained through different search engines such as Google, Yahoo and MSN. The comparative probabilities of attributes being candidate labels are presented which seem to be very promising, achieved as high as 93% probability of assigning good label to anonymous attribute. To the best of our knowledge, this is the first of its kind for label assignment based on multiple search engines' recommendation.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom