Premium
Optimal screening and discovery of sparse signals with applications to multistage high throughput studies
Author(s) -
Tony Cai T.,
Sun Wenguang
Publication year - 2017
Publication title -
journal of the royal statistical society: series b (statistical methodology)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.523
H-Index - 137
eISSN - 1467-9868
pISSN - 1369-7412
DOI - 10.1111/rssb.12171
Subject(s) - inference , computer science , focus (optics) , context (archaeology) , throughput , multiple comparisons problem , signal (programming language) , false discovery rate , feature (linguistics) , data mining , algorithm , mathematics , artificial intelligence , statistics , wireless , biology , telecommunications , paleontology , biochemistry , physics , chemistry , linguistics , philosophy , gene , optics , programming language
Summary A common feature in large‐scale scientific studies is that signals are sparse and it is desirable to narrow down significantly the focus to a much smaller subset in a sequential manner. We consider two related data screening problems: one is to find the smallest subset such that it virtually contains all signals and another is to find the largest subset such that it essentially contains only signals. These screening problems are closely connected to but distinct from the more conventional signal detection or multiple‐testing problems. We develop phase transition diagrams to characterize the fundamental limits in simultaneous inference and derive data‐driven screening procedures which control the error rates with near optimality properties. Applications in the context of multistage high throughput studies are discussed.