fdrSAFE: Selective Aggregation for Local False Discovery Rate Estimation
Citation: Jenna M. Landy and Giovanni Parmigiani. “fdrSAFE: Selective Aggregation for Local False Discovery Rate Estimation.” arXiv preprint arXiv:2401.12865 (2024).
R Software Package: fdrSAFE
Estimating local false discovery rates (fdr) is central to large-scale multiple hypothesis testing, yet different methods often produce divergent results, and there is little guidance for selecting among them. Because ground truth hypothesis labels are unobservable, standard model selection cannot be used. We present fdrSAFE (selective aggregation for fdr estimation), a data-driven selective ensembling approach that estimates model performances on synthetic datasets designed to resemble the observed data but with known ground truth. With simulation studies and an experimental spike-in transcriptomic dataset, we show that fdrSAFE achieves robust near-optimality, performing well across diverse settings where baseline model performances vary. Along with improved fdr estimates, this framework enhances replicability by replacing arbitrary model choice with a principled, data-adaptive procedure. An open-source R software package is available on GitHub at jennalandy/fdrSAFE.
Advised by Giovanni Parmigiani, PhD
Department of Data Science, Dana Farber Cancer Institute
Department of Biostatistics, Harvard T.H. Chan School of Public Health