Performs AKO (Aggregation of Multiple Knockoffs)

agg_BH(Ws_mat, fdr = 0.1, offset = 0, gamma = 0.3)

Arguments

Ws_mat

A matrix of test statistics from multiple knockoff filters, where each row represents one set of test statistics and each column represents a variable.

fdr

A numeric value of the target false discovery rate (FDR) level. Default is \(0.1\).

offset

An integer (0 or 1) specifying the offset in the empirical p-value calculation. Default is \(0\).

gamma

A numeric value for quantile aggregation in the multiple knockoff p-value aggregation. Default is \(0.3\).

Value

A vector shat containing the indices of selected variables after aggregating knockoff results.

Details

  1. Calculate intermediate p-value \(\pi_j^{(b)}\), for all \(j \in [p]\) and \(b \in [B]\): $$ \pi_j = \begin{cases} \frac{1 + \#\left\{k: W_k \leq -W_j \right\}}{p}, & \text{if } W_j > 0 \\ 1, & \text{if } W_j \leq 0 \end{cases}$$

  2. Aggregate using the quantile aggregation procedure (Meinshausen et al. 2009): $$ \bar{\pi}_j = \min \left\{1, \frac{q_\gamma\left(\left\{\pi_j^{(b)}: b \in [B]\right\}\right)}{\gamma}\right\} $$

  3. Control FDR using Benjamini-Hochberg step-up procedure (BH, Benjamini & Hochberg 1995):

    • Order p-values: \(\bar{\pi}_{(1)} \leq \bar{\pi}_{(2)} \leq \ldots \leq \bar{\pi}_{(p)}\).

    • Find: \(\widehat{k}_{BH} = \max \left\{k: \bar{\pi}_{(k)} \leq \frac{k \alpha}{p}\right\}\).

    • Select: \(\widehat{\mathcal{S}} = \left\{j \in [p]: \bar{\pi}_{(j)} \leqslant \bar{\pi}_{\left(\widehat{k}_{BH}\right)}\right\}\).

References

  • Nguyen TB, Chevalier JA, Thirion B, Arlot S. Aggregation of multiple knockoffs. In: International Conference on Machine Learning. PMLR; 2020. p. 7283–93.

  • Tian P, Hu Y, Liu Z et al. Grace-AKO: a novel and stable knockoff filter for variable selection incorporating gene network structures. BMC Bioinformatics#' @keywords internal

See also

Other aggregate: agg_Avg(), agg_Freq()

Examples

set.seed(2024)
p = 100; n = 80
X = generate_X(n=80,p=100)
y <- generate_y(X, p_nn=10, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.glmnet_coefdiff,
                       offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.glmnet_coefdiff, 
#>     fdr = 0.1, offset = 1)
#> 
#> Selected variables:
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> Frequency of selected variables from 10 knockoff copys:
#>   [1] 10 10 10 10 10  8 10 10  9 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
agg_BH(res1$Ws)
#>  [1]  1  2  3  4  5  6  7  8  9 10