Towards building a collection of web archiving research articles | Zendy

Ayala Brenda Reyes | Zendy; Caragea Cornelia | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Towards building a collection of web archiving research articles

Author(s) -

Ayala Brenda Reyes,

Caragea Cornelia

Publication year - 2014

Publication title -

proceedings of the american society for information science and technology

Language(s) - English

Resource type - Journals

eISSN - 1550-8390

pISSN - 0044-7870

DOI - 10.1002/meet.2014.14505101150

Subject(s) - web crawler , world wide web , crawling , computer science , field (mathematics) , context (archaeology) , subject (documents) , information retrieval , web page , data science , geography , medicine , mathematics , archaeology , pure mathematics , anatomy

The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research