Abstract
The high-throughput biomedical technology enables measurement of thousands of gene expression levels contemporaneously. A major task in analyzing these gene expression data is to identify both over-expressed and under-expressed genes. The popular two-group models select the non-null genes without further classifying them as overexpression or underexpression. Consequently, two-group decision rules are unable to constrain the numbers of falsely discovered over-expressed or under-expressed genes respectively. We propose a general three-group model that allows dependence between the test statistics and develop a decision rule that separately controls the two types of false discoveries. We show that the optimal decision rule in our three-group model has a special monotonic structure. By making use of this monotonic structure, we can linearize the two-directional false discovery rate constraints. We prove that our decision rule optimizes the expected number of true discoveries while controlling the proportions of falsely discovered over-expressed and under-expressed genes at desired levels simultaneously. The data-driven versions of the proposed procedures are suggested, and their consistency is established. Comparisons with state-of-the-art approaches and applications to genomic studies show that our procedures work well.
| Original language | English |
|---|---|
| Article number | e10329 |
| Journal | Statistics in Medicine |
| Volume | 44 |
| Issue number | 5 |
| DOIs | |
| State | Published - 28 Feb 2025 |
Keywords
- monotone likelihood ratio
- multiple tests
- signal classification
- three-group models