Premium
Transferring biological sequence analysis tools to break‐point detection for on‐line monitoring: A control chart based on the local score
Author(s) -
Mercier Sabine
Publication year - 2020
Publication title -
quality and reliability engineering international
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.913
H-Index - 62
eISSN - 1099-1638
pISSN - 0748-8017
DOI - 10.1002/qre.2703
Subject(s) - cusum , control chart , statistics , ewma chart , mathematics , standard deviation , algorithm , statistic , control limits , chart , computer science , process (computing) , operating system
The Lindley process defined for the queuing file domain is equivalent to the cumulative sum (CUSUM) process used for break‐point detection in process control. The maximum of the Lindley process, called local score, is used to highlight atypical regions in biological sequences, and its distribution has been established by different manners. I propose here to use the local score and also a partial maximum of the Lindley process over the immediate past to create control charts. Stopping time corresponds to the first time where the statistic achieves a statistical significance less than a given threshold α in ]0,1[, the instantaneous first error rate. The local score p value is computed using existing theoretical results. I establish here the exact distribution of the partial maximum of the Lindley process. Performance of the control charts is evaluated by Monte Carlo estimation of the average run lengths for an in‐control process ( A R L 0 ) and for an out‐of‐control process ( A R L 1 ). I also use the standard deviation of the run length ( S d R L ) and the extra quadratic loss ( E Q L ). Comparison with the usual and recent control charts present in the literature shows that the local score control chart outperforms the others with a much larger A R L 0 and A R L 1 smaller or of the same order. Many interesting openings exist for the local score chart: not only Gaussian model but also any of them, Markovian dependance of the data, both location and dispersion monitoring at the same time can be considered.