z-logo
open-access-imgOpen Access
Not all mementos are created equal: measuring the impact of missing resources
Author(s) -
Justin F. Brunelle,
Mat Kelly,
Hany M. SalahEldeen,
Michele C. Weigle,
Michael L. Nelson
Publication year - 2015
Publication title -
international journal on digital libraries
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.367
H-Index - 32
eISSN - 1432-5012
pISSN - 1432-1300
ISBN - 978-1-4799-5569-5
DOI - 10.1007/s00799-015-0150-6
Subject(s) - computer science , missing data , resource (disambiguation) , measure (data warehouse) , the internet , web page , web resource , world wide web , information retrieval , data mining , machine learning , computer network
Web archives do not always capture every resource on every page that they attempt to archive. This results in archived pages missing a portion of their embedded resources. These embedded resources have varying historic, utility, and importance values. The proportion of missing embedded resources does not provide an accurate measure of their impact on the Web page; some embedded resources are more important to the utility of a page than others. We propose a method to measure the relative value of embedded resources and assign a damage rating to archived pages as a way to evaluate archival success. In this paper, we show that Web users’ perceptions of damage are not accurately estimated by the proportion of missing embedded resources. In fact, the proportion of missing embedded resources is a less accurate estimate of resource damage than a random selection. We propose a damage rating algorithm that provides closer alignment to Web user perception, providing an overall improved agreement with users on memento damage by 17 % and an improvement by 51 % if the mementos have a damage rating delta \(>\)0.30. We use our algorithm to measure damage in the Internet Archive, showing that it is getting better at mitigating damage over time (going from a damage rating of 0.16 in 1998 to 0.13 in 2013). However, we show that a greater number of important embedded resources (2.05 per memento on average) are missing over time. Alternatively, the damage in WebCite is increasing over time (going from 0.375 in 2007 to 0.475 in 2014), while the missing embedded resources remain constant (13 % of the resources are missing on average). Finally, we investigate the impact of JavaScript on the damage of the archives, showing that a crawler that can archive JavaScript-dependent representations will reduce memento damage by 13.5 %.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom