Helping Novices Avoid the Hazards of Data: Leveraging Ontologies to Improve Model Generalization Automatically with Online Data Sources | Zendy

Janpuangtong Sasin | Zendy; Shell Dylan A. | Zendy

Open Access

Helping Novices Avoid the Hazards of Data: Leveraging Ontologies to Improve Model Generalization Automatically with Online Data Sources

Author(s) -

Janpuangtong Sasin,

Shell Dylan A.

Publication year - 2016

Publication title -

ai magazine

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.597

H-Index - 79

eISSN - 2371-9621

pISSN - 0738-4602

DOI - 10.1609/aimag.v37i2.2626

Subject(s) - computer science , ontology , process (computing) , exploit , data science , domain knowledge , domain (mathematical analysis) , set (abstract data type) , relevance (law) , knowledge extraction , data mining , artificial intelligence , mathematical analysis , philosophy , computer security , mathematics , epistemology , political science , law , programming language , operating system

The infrastructure and tools necessary for large‐scale data analytics, formerly the exclusive purview of experts, are increasingly available. Whereas a knowledgeable data miner or domain expert can rightly be expected to exercise caution when required (for example, around misleading conclusions supposedly supported by the data), the nonexpert may benefit from some judicious assistance. This article describes an end‐to‐end learning framework that allows a novice to create models from data easily by helping structure the model‐building process and capturing extended aspects of domain knowledge. By treating the whole modeling process interactively and exploiting high‐level knowledge in the form of an ontology, the framework is able to aid the user in a number of ways, including in helping to avoid pitfalls such as data dredging. Prudence must be exercised to avoid these hazards as certain conclusions may only be supported if, for example, there is extra knowledge that gives reason to trust a narrower set of hypotheses. This article adopts the solution of using higher‐level knowledge to allow this sort of domain knowledge to be used automatically, selecting relevant input attributes, and thence constraining the hypothesis space. We describe how the framework automatically exploits structured knowledge in an ontology to identify relevant concepts, and how a data extraction component can make use of online data sources to find measurements of those concepts so that their relevance can be evaluated. To validate our approach, models of four different problem domains were built using our implementation of the framework. Prediction error on unseen examples of these models show that our framework, making use of the ontology, helps to improve model generalization.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore