knockoff.filter.Rd
This function runs the Knockoff procedure, selecting variables relevant for predicting the outcome of interest using knockoffs and test statistics.
knockoff.filter(
X,
y,
Xk = NULL,
statistic = stat.glmnet_coefdiff,
aggregate = agg_Freq,
fdr = 0.1,
offset = 1,
verbose = FALSE,
...
)
A numeric n-by-p matrix or data frame of predictors.
A response vector of length n.
A list of Knockoff copys.
A function to compute test statistics (default: stat.glmnet_coefdiff
).
Function to aggregate results from multiple knockoffs (default: agg_Freq
).
Target false discovery rate (default: 0.1).
Offset for threshold computation (0 or 1; default: 1).
Logical; if TRUE, prints progress messages during knockoff generation and statistic calculation (default: FALSE).
Additional arguments passed to the statistic
function.
An object of class knockoff.filter
, containing:
Single Knockoff Case:
call: The matched call of the function.
W: The test statistics for the original variables.
threshold: The computed selection threshold.
shat: The indices of variables selected based on the threshold.
Multiple Knockoff Case:
call: The matched call of the function.
shat: The aggregated indices of selected variables.
Ws: The matrix of test statistics for multiple knockoff copies.
thresholds: A vector of thresholds for each knockoff.
shat_list: A list where each element contains the indices of selected variables for a corresponding knockoff copy.
shat_mat: A binary matrix where each row indicates the selected variables for a specific knockoff copy (1 for selected, 0 for not selected).
Candes, E., Fan, Y., Janson, L., & Lv, J. (2018). Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3), 551-577.
# Linear Regression
set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.glmnet_coefdiff,
offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.glmnet_coefdiff,
#> fdr = 0.1, offset = 1)
#>
#> Selected variables:
#> [1] 1 2 3 4 5 6 7 8 9 10
#>
#> Frequency of selected variables from 10 knockoff copys:
#> [1] 10 10 10 10 10 8 10 10 9 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
perf_eval(res1$shat,Ac,Ic)
#> [1] 1 0
# Logistic Regression
lp <- generate_lp(X, p_nn=k, a=3)
pis <- plogis(lp)
Y <- factor(rbinom(n, 1, pis))
res2 = knockoff.filter(X, Y, Xk, statistic = stat.glmnet_coefdiff,
family = 'binomial', offset = 0, fdr = 0.2)
res2
#> Call:
#> knockoff.filter(X = X, y = Y, Xk = Xk, statistic = stat.glmnet_coefdiff,
#> fdr = 0.2, offset = 0, family = "binomial")
#>
#> Selected variables:
#> [1] 1 2 4 9 35 77 90
#>
#> Frequency of selected variables from 10 knockoff copys:
#> [1] 9 10 2 6 0 0 0 0 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
#> [26] 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [51] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0
#> [76] 0 8 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 1 3 0 0 0
perf_eval(res2$shat,Ac,Ic)
#> [1] 0.4000000 0.4285714