Importance statistics based on a GLM with cross-validation

Fits a generalized linear model via penalized maximum likelihood with cross-validation and computes the difference statistic $$W_j = |Z_j| - |\tilde{Z}_j|$$ where $Z_j$ and $\tilde{Z}_j$ are the coefficient estimates for the jth variable and its knockoff, respectively. The regularization parameter $\lambda$ is selected by cross-validation and computed with glmnet.

stat.glmnet_coefdiff(X, X_k, y, family = "gaussian", cores = 2, ...)

Arguments

X

n-by-p matrix of original variables.

X_k

n-by-p matrix of knockoff variables.

y

Response variable vector of length n. Quantitative for family = "gaussian" or "poisson".

For family = "binomial", y should be either:
- a two-level factor,
- a two-column matrix of counts, or
- proportions.
For family = "multinomial", y can be a factor with at least two levels or a matrix.
For family = "cox", y should be a two-column matrix with 'time' and 'status'.
For family = "mgaussian", y is a matrix of quantitative responses.

family

Response type, one of 'gaussian', 'binomial', 'multinomial', 'cox', or 'mgaussian'.

cores

Number of cores to use for parallel computation. Defaults to 2 if available.

...

Additional arguments specific to glmnet (see Details).

Value

A vector of statistics $W$ of length p.

Details

This function uses the glmnet package to fit a GLM model via penalized maximum likelihood. The value of $\lambda$ is chosen by 10-fold cross-validation unless specified otherwise.

The optional nlambda parameter can control the number of $\lambda$ values, and the default is 500. For family="binomial", if the lambda sequence isn't provided, a log-linear sequence is generated.

For more details, see glmnet::cv.glmnet() and glmnet::glmnet().

Examples

set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.glmnet_coefdiff,
                       offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.glmnet_coefdiff, 
#>     fdr = 0.1, offset = 1)
#> 
#> Selected variables:
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> Frequency of selected variables from 10 knockoff copys:
#>   [1] 10 10 10 10 10  8 10 10  9 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
perf_eval(res1$shat,Ac,Ic)
#> [1] 1 0

Importance statistics based on a GLM with cross-validation

Arguments

Value

Details

See also

Examples