Premium
Good–Turing frequency estimation in a finite population
Author(s) -
Hwang WenHan,
Lin ChihWei,
Shen TsungJen
Publication year - 2015
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.201300168
Subject(s) - turing , computer science , sampling (signal processing) , simple (philosophy) , population , sample (material) , estimation , artificial intelligence , algorithm , machine learning , theoretical computer science , mathematics , statistics , computer vision , philosophy , chemistry , demography , management , epistemology , filter (signal processing) , chromatography , sociology , economics , programming language
Good–Turing frequency estimation (Good, [Good, I. J., 1953]) is a simple, effective method for predicting detection probabilities of objects of both observed and unobserved classes based on observed frequencies of classes in a sample. The method has been used widely in several disciplines, such as information retrieval, computational linguistics, text recognition, and ecological diversity estimation. Nevertheless, existing studies assume sampling with replacement or sampling from an infinite population, which might be inappropriate for many practical applications. In light of this limitation, this article presents a modification of the Good–Turing estimation method to account for finite population sampling. We provide three practical extensions of the modified method, and we examine performance of the modified method and its extensions in simulation experiments.