Open Access
Das Internet als linguistisches Korpus
Author(s) -
Hans Bickel
Publication year - 2006
Publication title -
linguistik online
Language(s) - English
Resource type - Journals
ISSN - 1615-3014
DOI - 10.13092/lo.28.612
Subject(s) - computer science , german , the internet , corpus linguistics , natural language processing , linguistics , artificial intelligence , text corpus , linguistic analysis , word (group theory) , connection (principal bundle) , world wide web , philosophy , structural engineering , engineering
This article discusses whether the Internet can be used as a linguistic corpus. It is based on experiences in connection with the Variantenwörterbuch des Deutschen (Dictionary of Standard German Variants), which was compiled 1997-2004. In order to identify national and regional variants of the German language in Germany, Austria and Switzerland, it was necessary to work with a large linguistic corpus that could also provide data on the frequency of rather rare words. The question was: Is the Internet suitable as a corpus for linguistic frequency analysis? The use of the WWW as corpus can be suitable only1. if reliable and reproducible results can be obtained;2. if the results are closely related to the language as it is actually used.The test showed that the Internet is an extremely useful corpus to get information on word frequency. The enormous size and the large number of different text types makes it an extremely versatile corpus, which has a systematic connection to the written language reality.