Importance statistics based on xgboost

Computes the difference statistic $$W_j = |Z_j| - |\tilde{Z}_j|$$ where $Z_j$ and $\tilde{Z}_j$ are the xgboost feature importances of the jth variable and its knockoff, respectively.

stat.xgboost(X, X_k, y, family = "gaussian", nrounds = 10, ...)

Arguments

X: n-by-p matrix of original variables.
X_k: n-by-p matrix of knockoff variables.
y: vector of length n, containing the response variables. If a factor, classification is assumed, otherwise regression is assumed.
family: specifies the type of model to be fit: 'gaussian' for regression or 'binomial' for classification.
nrounds: number of boosting rounds for xgboost.
...: additional arguments specific to xgboost (see Details).

Value

A vector of statistics $W$ of length p.

Examples

set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.xgboost,
                       offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.xgboost, 
#>     fdr = 0.1, offset = 1)
#> 
#> Selected variables:
#> integer(0)
#> 
#> Frequency of selected variables from 0 knockoff copys:
#>   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
perf_eval(res1$shat,Ac,Ic)
#> [1] 0 0

Arguments

Value

See also

Examples