A Diagnostic Procedure for High-Dimensional Data Streams via Missed Discovery Rate Control

Research output: Contribution to journalArticlepeer-review

40 Scopus citations

Abstract

Monitoring complex systems involving high-dimensional data streams (HDS) provides quick real-time detection of abnormal changes of system performance, but accurate and efficient diagnosis of the streams responsible has also become increasingly important in many data-rich statistical process control applications. Existing diagnostic procedures, designed for low/moderate dimensional multivariate process, may miss too much important information in the out-of-control streams with a high signal-to-noise ratio (SNR) or waste too many resources finding useless in-control streams with a low SNR. In addition, these procedures do not differentiate between streams according to their severity. In this article, we formulate the diagnosis problem of HDS as a multiple testing problem and provide a computationally fast diagnostic procedure to control the weighted missed discovery rate (wMDR) at some satisfactory level. The proposed procedure overcomes the limitations of conventional diagnostic procedures by controlling the wMDR and minimizing the expected number of false positives as well. We show theoretically that the proposed procedure is asymptotically valid and optimal in a certain sense. Simulation studies and a real data analysis from a semiconductor manufacturing process show that the proposed procedure works very well in practice.

Original languageEnglish
Pages (from-to)84-100
Number of pages17
JournalTechnometrics
Volume62
Issue number1
DOIs
StatePublished - 2 Jan 2020

Keywords

  • Big data
  • Data-driven
  • Fault isolation
  • Multiple testing
  • Statistical process control

Fingerprint

Dive into the research topics of 'A Diagnostic Procedure for High-Dimensional Data Streams via Missed Discovery Rate Control'. Together they form a unique fingerprint.

Cite this