stat.forward_selection.Rd
Computes the statistic $$W_j = \max(Z_j, Z_{j+p}) \cdot \mathrm{sgn}(Z_j - Z_{j+p}),$$ where \(Z_1,\dots,Z_{2p}\) give the reverse order in which the 2p variables (the originals and the knockoffs) enter the forward selection model. See the Details for information about forward selection.
stat.forward_selection(X, X_k, y, omp = F)
A vector of statistics \(W\) of length p.
In forward selection, the variables are chosen iteratively to maximize
the inner product with the residual from the previous step. The initial
residual is always y
. In standard forward selection
(stat.forward_selection
), the next residual is the remainder after
regressing on the selected variable; when orthogonal matching pursuit
is used, the next residual is the remainder
after regressing on all the previously selected variables.
Other statistics:
stat.SHAP()
,
stat.glmnet_coefdiff()
,
stat.glmnet_lambdadiff()
,
stat.random_forest()
,
stat.sqrt_lasso()
,
stat.stability_selection()
,
stat.xgboost()
set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.forward_selection,
offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.forward_selection,
#> fdr = 0.1, offset = 1)
#>
#> Selected variables:
#> [1] 4 7 9 10 13 27 29 33 37 39 43 47 49 53 59 63 69 73 87 89 93
#>
#> Frequency of selected variables from 10 knockoff copys:
#> [1] 0 0 4 10 0 0 9 0 7 7 0 0 9 1 0 0 5 0 5 0 0 2 2 0 0
#> [26] 0 9 0 9 1 0 3 10 4 0 0 8 0 8 1 0 1 6 4 0 0 6 0 6 3
#> [51] 0 3 9 3 0 0 3 0 8 3 0 4 8 1 0 0 4 0 6 0 0 0 9 1 0
#> [76] 0 3 0 4 1 0 1 5 1 0 0 6 0 7 4 0 5 7 5 0 0 4 0 3 0
perf_eval(res1$shat,Ac,Ic)
#> [1] 0.4000000 0.8095238