The Knockoff Filter — knockoff.filter • zKnock

This function runs the Knockoff procedure, selecting variables relevant for predicting the outcome of interest using knockoffs and test statistics.

knockoff.filter(
  X,
  y,
  Xk = NULL,
  statistic = stat.glmnet_coefdiff,
  aggregate = agg_Freq,
  fdr = 0.1,
  offset = 1,
  verbose = FALSE,
  ...
)

Arguments

X: A numeric n-by-p matrix or data frame of predictors.
y: A response vector of length n.
Xk: A list of Knockoff copys.
statistic: A function to compute test statistics (default: stat.glmnet_coefdiff).
aggregate: Function to aggregate results from multiple knockoffs (default: agg_Freq).
fdr: Target false discovery rate (default: 0.1).
offset: Offset for threshold computation (0 or 1; default: 1).
verbose: Logical; if TRUE, prints progress messages during knockoff generation and statistic calculation (default: FALSE).
...: Additional arguments passed to the statistic function.

Value

An object of class knockoff.filter, containing:

Single Knockoff Case:
- call: The matched call of the function.
- W: The test statistics for the original variables.
- threshold: The computed selection threshold.
- shat: The indices of variables selected based on the threshold.
Multiple Knockoff Case:
- call: The matched call of the function.
- shat: The aggregated indices of selected variables.
- Ws: The matrix of test statistics for multiple knockoff copies.
- thresholds: A vector of thresholds for each knockoff.
- shat_list: A list where each element contains the indices of selected variables for a corresponding knockoff copy.
- shat_mat: A binary matrix where each row indicates the selected variables for a specific knockoff copy (1 for selected, 0 for not selected).

References

Candes, E., Fan, Y., Janson, L., & Lv, J. (2018). Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3), 551-577.

Examples

# Linear Regression
set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.glmnet_coefdiff,
                       offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.glmnet_coefdiff, 
#>     fdr = 0.1, offset = 1)
#> 
#> Selected variables:
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> Frequency of selected variables from 10 knockoff copys:
#>   [1] 10 10 10 10 10  8 10 10  9 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
perf_eval(res1$shat,Ac,Ic)
#> [1] 1 0

# Logistic Regression
lp <- generate_lp(X, p_nn=k, a=3)
pis <- plogis(lp)
Y <- factor(rbinom(n, 1, pis))
res2 = knockoff.filter(X, Y, Xk, statistic = stat.glmnet_coefdiff,
                       family = 'binomial', offset = 0, fdr = 0.2)
res2
#> Call:
#> knockoff.filter(X = X, y = Y, Xk = Xk, statistic = stat.glmnet_coefdiff, 
#>     fdr = 0.2, offset = 0, family = "binomial")
#> 
#> Selected variables:
#> [1]  1  2  4  9 35 77 90
#> 
#> Frequency of selected variables from 10 knockoff copys:
#>   [1]  9 10  2  6  0  0  0  0 10  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0
#>  [26]  0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [51]  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  3  0  0  0
#>  [76]  0  8  0  0  0  0  0  0  0  0  0  0  0  0  7  0  0  0  0  0  1  3  0  0  0
perf_eval(res2$shat,Ac,Ic)
#> [1] 0.4000000 0.4285714