Automatic Genre Identification: Towards a Flexible Classification Scheme
Author(s) -
Marina Santini
Publication year - 2007
Publication title -
electronic workshops in computing
Language(s) - English
Resource type - Conference proceedings
ISSN - 1477-9358
DOI - 10.14236/ewic/fdia2007.1
Subject(s) - scheme (mathematics) , computer science , classification scheme , identification (biology) , identification scheme , artificial intelligence , information retrieval , natural language processing , data mining , mathematics , biology , measure (data warehouse) , mathematical analysis , botany
This paper presents an automatic genre classification model that implements a flexible classification scheme, i.e. a scheme capable of performing zero-, one- or multi-genre assignment. I suggest that this scheme is more appropriate for genres on the web, because many web pages have often more than one genre or none at all. The model that I propose relies on the distinction between the concepts of 'text types' and 'genre', which are both 'inferred' and not 'learned' from pre-labelled examples. The main drawback of this approach is that it cannot be fully evaluated given the limitations of current genre research. However, I present a partial evaluation that shows that the model performs competitively, and remains stable when re-scaled.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom