stat.random_forest.Rd
Computes the difference statistic $$W_j = |Z_j| - |\tilde{Z}_j|$$ where \(Z_j\) and \(\tilde{Z}_j\) are the random forest feature importances of the jth variable and its knockoff, respectively.
stat.random_forest(X, X_k, y, ...)
A vector of statistics \(W\) of length p.
This function uses the ranger
package to compute variable
importance measures. The importance of a variable is measured as the total decrease
in node impurities from splitting on that variable, averaged over all trees.
For regression, the node impurity is measured by residual sum of squares.
For classification, it is measured by the Gini index.
For a complete list of the available additional arguments, see ranger::ranger()
.
Other statistics:
stat.SHAP()
,
stat.forward_selection()
,
stat.glmnet_coefdiff()
,
stat.glmnet_lambdadiff()
,
stat.sqrt_lasso()
,
stat.stability_selection()
,
stat.xgboost()
set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.random_forest,
offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.random_forest,
#> fdr = 0.1, offset = 1)
#>
#> Selected variables:
#> integer(0)
#>
#> Frequency of selected variables from 0 knockoff copys:
#> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
perf_eval(res1$shat,Ac,Ic)
#> [1] 0 0