AISLES THROUGH THE CATEGORY FOREST - Utilising the Wikipedia Category System for Corpus Building in Machine Learning | Zendy

R체diger Gleim | Zendy; Alexander Mehler | Zendy; Matthias Dehmer | Zendy; Olga Pustylnikov | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

AISLES THROUGH THE CATEGORY FOREST - Utilising the Wikipedia Category System for Corpus Building in Machine Learning

Author(s) -

R체diger Gleim,

Alexander Mehler,

Matthias Dehmer,

Olga Pustylnikov

Publication year - 2007

Language(s) - English

Resource type - Conference proceedings

DOI - 10.5220/0001267101420149

Subject(s) - computer science , natural language processing , artificial intelligence , information retrieval , world wide web

The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the web are much more demanding. In order to successfully develop approaches to web mining, respective corpora are needed. However, the composition of genreor domain-specific web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia Category Explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research