Fits a generalized linear model via penalized maximum likelihood and computes the difference statistic $$W_j = Z_j - \tilde{Z}_j$$ where \(Z_j\) and \(\tilde{Z}_j\) are the maximum values of the regularization parameter \(\lambda\) at which the jth variable and its knockoff enter the model, respectively.

stat.glmnet_lambdadiff(X, X_k, y, family = "gaussian", ...)

Arguments

X

n-by-p matrix of original variables.

X_k

n-by-p matrix of knockoff variables.

y

Response variable vector of length n. Quantitative for family = "gaussian" or "poisson".

  • For family = "binomial", y should be either:

    • a two-level factor,

    • a two-column matrix of counts, or

    • proportions.

  • For family = "multinomial", y can be a factor with at least two levels or a matrix.

  • For family = "cox", y should be a two-column matrix with 'time' and 'status'.

  • For family = "mgaussian", y is a matrix of quantitative responses.

family

response type (see above).

...

additional arguments specific to glmnet (see Details).

Value

A vector of statistics \(W\) of length p.

Details

This function uses glmnet to compute the regularization path on a fine grid of \(\lambda\)'s.

The nlambda parameter can be used to control the granularity of the grid of \(\lambda\)'s. The default value of nlambda is 500.

If the family is 'binomial' and a lambda sequence is not provided by the user, this function generates it on a log-linear scale before calling 'glmnet'.

The default response family is 'gaussian', for a linear regression model. Different response families (e.g. 'binomial') can be specified by passing an optional parameter 'family'.

For a complete list of the available additional arguments, see glmnet::glmnet().

Examples

set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.glmnet_lambdadiff,
                       offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.glmnet_lambdadiff, 
#>     fdr = 0.1, offset = 1)
#> 
#> Selected variables:
#> [1]  1  3  6  7  9 10
#> 
#> Frequency of selected variables from 60 knockoff copys:
#>   [1] 10  0 10  0  0 10 10  0 10 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
perf_eval(res1$shat,Ac,Ic)
#> [1] 0.6 0.0