
Web‐crawling reliability
Author(s) -
Cothey Viv
Publication year - 2004
Publication title -
journal of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1532-2890
pISSN - 1532-2882
DOI - 10.1002/asi.20078
Subject(s) - crawling , computer science , web crawler , world wide web , reliability (semiconductor) , web page , information retrieval , biology , power (physics) , physics , anatomy , quantum mechanics
In this article, I investigate the reliability, in the social science sense, of collecting informetric data about the World Wide Web by Web crawling. The investigation includes a critical examination of the practice of Web crawling and contrasts the results of content crawling with the results of link crawling . It is shown that Web crawling by search engines is intentionally biased and selective. I also report the results of a large‐scale experimental simulation of Web crawling that illustrates the effects of different crawling policies on data collection. It is concluded that the reliability of Web crawling as a data collection technique is improved by fuller reporting of relevant crawling policies.