z-logo
open-access-imgOpen Access
Towards building a collection of web archiving research articles
Author(s) -
Ayala Brenda Reyes,
Caragea Cornelia
Publication year - 2014
Publication title -
proceedings of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1550-8390
pISSN - 0044-7870
DOI - 10.1002/meet.2014.14505101150
Subject(s) - web crawler , world wide web , crawling , computer science , field (mathematics) , context (archaeology) , subject (documents) , information retrieval , web page , data science , geography , medicine , mathematics , archaeology , pure mathematics , anatomy
The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here