Word length, sentence length and frequency – Zipf revisited | Zendy

Sigurd Bengt | Zendy; EegOlofsson Mats | Zendy; Van Weijer Joost | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Word length, sentence length and frequency – Zipf revisited

Author(s) -

Sigurd Bengt,

EegOlofsson Mats,

Van Weijer Joost

Publication year - 2004

Publication title -

studia linguistica

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.187

H-Index - 28

eISSN - 1467-9582

pISSN - 0039-3193

DOI - 10.1111/j.0039-3193.2004.00109.x

Subject(s) - zipf's law , word (group theory) , sentence , word length , word lists by frequency , distribution (mathematics) , mathematics , german , relation (database) , linguistics , computer science , natural language processing , statistics , mathematical analysis , philosophy , geometry , database

. This paper examines data from English, Swedish and German in order to find a theoretical distribution that describes the observed relation between word length and frequency. In Swedish and English, most word tokens consist of three letters only, while shorter or longer words occur less frequently. We found that the equation with the general form f exp = a * L b * c L (a variant of the so‐called gamma distribution) approximates the observed frequencies reasonably well. This formula incorporates both the fact that the number of possible words increases with word length, and the fact that longer words tend to be avoided, presumably because they are uneconomic. To our knowledge this formula has not been proposed to describe word frequency data. We examined frequency distributions of word length in Swedish and English, and explored different variants of the equation by systematically varying the a, b and c parameters. Subsequently, we also applied the formula to the frequency distribution of sentence length in English, and found an almost perfect fit for a corpus consisting of different text genres. Moreover, the data showed that the formula can be used to distinguish between different kinds of text genres.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore