Topic Modeling for Wikipedia Link Disambiguation
Author(s) -
Bradley Skaggs,
Lise Getoor
Publication year - 2014
Publication title -
acm transactions on office information systems
Language(s) - English
Resource type - Journals
eISSN - 1558-1152
pISSN - 0734-2047
DOI - 10.1145/2633044
Subject(s) - hyperlink , computer science , link (geometry) , link analysis , information retrieval , encyclopedia , task (project management) , process (computing) , world wide web , natural language processing , web page , computer network , management , library science , economics , operating system
Many articles in the online encyclopedia Wikipedia have hyperlinks to ambiguous article titles; these ambiguous links should be replaced with links to unambiguous articles, a process known as disambiguation. We propose a novel statistical topic model based on link text, which we refer to as the Link Text Topic Model (LTTM), that we use to suggest new link targets for ambiguous links. To evaluate our model, we describe a method for extracting ground truth for this link disambiguation task from edits made to Wikipedia in a specific time period. We use this ground truth to demonstrate the superiority of LTTM over other existing link- and content-based approaches to disambiguating links in Wikipedia. Finally, we build a web service that uses LTTM to make suggestions to human editors wanting to fix ambiguous links in Wikipedia.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom