
Bootstrapping the Language Archive
Author(s) -
Steven Bird
Publication year - 2011
Publication title -
linguistic issues in language technology
Language(s) - English
Resource type - Journals
eISSN - 1945-3590
pISSN - 1945-3604
DOI - 10.33011/lilt.v6i.1243
Subject(s) - computer science , bootstrapping (finance) , natural language processing , axiom , artificial intelligence , natural language , set (abstract data type) , scalability , language identification , field (mathematics) , linguistics , programming language , database , geometry , mathematics , financial economics , pure mathematics , economics , philosophy
There are grounds to believe that language technology in general, and natural language processing in particular, have important roles to play in creating and analyzing corpora for small languages. This goes beyond the development of data management tools to the application of natural language processing techniques to small and noisy datasets, and the design of new methods that operate within the constraints of linguistic field data. A set of seven such constraints (or "axioms for scalable work with small languages") are presented, and suggestions for further NLP research are related back to these axioms.