Importance statistics based on XGBoost

Computes the difference statistic $$W_j = |Z_j| - |\tilde{Z}_j|$$ where $Z_j$ and $\tilde{Z}_j$ are the SHAP (SHapley Additive exPlanations) of the jth variable and its knockoff, respectively.

stat.SHAP(X, X_k, y, nrounds = 2, ...)

Arguments

X: n-by-p matrix of original variables.
X_k: n-by-p matrix of knockoff variables.
y: vector of length n, containing the response variables. If a factor, classification is assumed, otherwise regression is assumed.
nrounds: Number of boosting rounds for training the XGBoost model. Default is 2.
...: additional arguments specific to xgboost (see Details).

Value

A vector of statistics $W$ of length p.

Details

In XGBoost, SHAP (SHapley Additive exPlanations) values provide a way to interpret the model's predictions by breaking down the contribution of each feature to the final prediction. SHAP values show how much each feature increases or decreases the prediction compared to the average.
XGBoost uses the Tree SHAP algorithm, which efficiently computes these values for tree-based models. This helps in understanding both global feature importance (how features influence the model overall) and local explanations (how features impact individual predictions).
Key benefits include transparency, detailed feature importance, and the ability to explain complex models in a clear, interpretable way.
saabas is a vector of an individualized heuristic feature attribution method, which can be considered as an approximation for SHAP.

Examples

set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.SHAP,
                       offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.SHAP, 
#>     fdr = 0.1, offset = 1)
#> 
#> Selected variables:
#> integer(0)
#> 
#> Frequency of selected variables from 0 knockoff copys:
#>   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
perf_eval(res1$shat,Ac,Ic)
#> [1] 0 0

Importance statistics based on XGBoost

Arguments

Value

Details

See also

Examples