Factors affecting website reconstruction from the web infrastructure | Zendy

Frank McCown | Zendy; Norou Diawara | Zendy; Michael L. Nelson | Zendy

AI Assistant Blog Pricing

Open Access

Factors affecting website reconstruction from the web infrastructure

Author(s) -

Frank McCown,

Norou Diawara,

Michael L. Nelson

Publication year - 2007

Publication title -

odu digital commons (old dominion university)

Language(s) - English

Resource type - Conference proceedings

ISSN - 2575-7865

DOI - 10.1145/1255175.1255182

Subject(s) - web crawler , backup , pagerank , computer science , world wide web , resource (disambiguation) , search engine , web search engine , web resource , information retrieval , database , web search query , computer network

When a website is suddenly lost without a backup, it maybe reconstituted by probing web archives and search engine caches for missing content. In this paper we describe an experiment where we crawled and reconstructed 300 randomly selected websites on a weekly basis for 14 weeks. The reconstructions were performed using our web-repository crawler named Warrick which recovers missing resources from the Web Infrastructure (WI), the collective preservation effort of web archives and search engine caches. We examine several characteristics of the websites over time including birth rate, decay and age of resources. We evaluate the reconstructions when compared to the crawled sites and develop a statistical model for predicting reconstruction success from the WI. On average, we were able to recover 61% of each website's resources. We found that Google's PageRank, number of hops and resource age were the three most significant factors in determining if a resource would be recovered from the WI.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog