The frequency spectrum of finite samples from the intermittent silence process
Author(s) -
FerreriCancho Ramon,
Gavaldà Ricard
Publication year - 2009
Publication title -
journal of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1532-2890
pISSN - 1532-2882
DOI - 10.1002/asi.21033
Subject(s) - vocabulary , silence , range (aeronautics) , spectrum (functional analysis) , process (computing) , metaphor , computer science , focus (optics) , word lists by frequency , function (biology) , word (group theory) , distribution (mathematics) , mathematics , speech recognition , linguistics , artificial intelligence , physics , acoustics , mathematical analysis , quantum mechanics , philosophy , optics , materials science , evolutionary biology , sentence , composite material , biology , operating system
It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so‐called intermittent silence process. The same kind of process could reproduce or explain the counts of other kinds of units from a wide range of disciplines. Taking the linguistic metaphor, we focus on the frequency spectrum, i.e., the number of words with a certain frequency, and the vocabulary size, i.e., the number of different words of text generated by an intermittent silence process. We derive and explain how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom