Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz | Zendy

Uwe Quasthoff | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz

Author(s) -

Uwe Quasthoff

Publication year - 2009

Publication title -

linguistik online

Language(s) - English

Resource type - Journals

ISSN - 1615-3014

DOI - 10.13092/lo.39.484

Subject(s) - collocation (remote sensing) , computer science , neologism , natural language processing , thesaurus , vocabulary , artificial intelligence , german , machine readable dictionary , preprocessor , word (group theory) , information retrieval , linguistics , philosophy , machine learning

The corpus project Deutscher Wortschatz (German Vocabulary) at Leipzig University is collecting and processing textual data for 15 years. It now consists of approx. 2 billion running words in 160 million sentences. The dictionary is online available at www.wortschatz.uni-leipzig.de and, moreover, contains word co-occurrence data.The pre-processing of the data used mainly language independent methods and were used for corpora in other languages, too.The paper describes the production process for three dictionaries for which these corpus data were used: a thesaurus, a dictionary of neologisms, and a collocation dictionary. In all cases, the raw data for the dictionary entries were produced automatically, and the final entries were written only using these pre-selections. In the case of the thesaurus, the preprocessing consisted in a corpus based detection of semantically similar words. For the neologism dictionary the yearly frequency information were used and for the collocation dictionary, word co-occurrences and part of speech information were combined.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research