The Influence of Pre-processing on the Estimation of Readability of Web Documents
Author(s) -
João Palotti,
Guido Zuccon,
Allan Hanbury
Publication year - 2015
Publication title -
qut eprints (queensland university of technology)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2806416.2806613
Subject(s) - readability , computer science , personalization , information retrieval , reading (process) , world wide web , reading level , web page , natural language processing , linguistics , philosophy , programming language
This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely used text readability measures rely on surface level characteristics of text, such as the length of words and sentences. We demonstrate that different tools for extracting text from web pages lead to very different estimations of readability. This has an important implication for search engines because search result personalisation strategies that consider users reading ability may fail if incorrect text readability estimations are computed
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom