z-logo
open-access-imgOpen Access
Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing
Author(s) -
Lawrence Muchemi,
Gregory Grefenstette
Publication year - 2016
Publication title -
international journal of artificial intelligence and applications
Language(s) - English
Resource type - Journals
eISSN - 0976-2191
pISSN - 0975-900X
DOI - 10.5121/ijaia.2016.7401
Subject(s) - computer science , information retrieval , computational biology , world wide web , biology
In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based algorithm with a directed crawler. We exploit the multilingual open-content directory of the World Wide Web, DMOZ1 to seed the crawl, and the domain name to direct the crawl. This domain corpus is then input\udto our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchical semantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semantics project, we applied the resulting taxonomies to personal social media data Twitter, Gmail, Facebook, Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal information through faceted searching. We also perform a comprehensive corpus based evaluation of the algorithms based on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show that the induced taxonomies are of high quality. \u

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here