stat.glmnet_coefdiff.Rd
Fits a generalized linear model via penalized maximum likelihood with cross-validation
and computes the difference statistic
$$W_j = |Z_j| - |\tilde{Z}_j|$$
where \(Z_j\) and \(\tilde{Z}_j\) are the coefficient estimates for the
jth variable and its knockoff, respectively. The regularization parameter
\(\lambda\) is selected by cross-validation and computed with glmnet
.
stat.glmnet_coefdiff(X, X_k, y, family = "gaussian", cores = 2, ...)
n-by-p matrix of original variables.
n-by-p matrix of knockoff variables.
Response variable vector of length n. Quantitative for family = "gaussian" or "poisson".
For family = "binomial", y should be either:
a two-level factor,
a two-column matrix of counts, or
proportions.
For family = "multinomial", y can be a factor with at least two levels or a matrix.
For family = "cox", y should be a two-column matrix with 'time' and 'status'.
For family = "mgaussian", y is a matrix of quantitative responses.
Response type, one of 'gaussian', 'binomial', 'multinomial', 'cox', or 'mgaussian'.
Number of cores to use for parallel computation. Defaults to 2 if available.
Additional arguments specific to glmnet
(see Details).
A vector of statistics \(W\) of length p.
This function uses the glmnet
package to fit a GLM model via penalized maximum likelihood.
The value of \(\lambda\) is chosen by 10-fold cross-validation unless specified otherwise.
The optional nlambda
parameter can control the number of \(\lambda\) values, and the default
is 500. For family="binomial", if the lambda sequence isn't provided, a log-linear sequence is generated.
For more details, see glmnet::cv.glmnet()
and glmnet::glmnet()
.
Other statistics:
stat.SHAP()
,
stat.forward_selection()
,
stat.glmnet_lambdadiff()
,
stat.random_forest()
,
stat.sqrt_lasso()
,
stat.stability_selection()
,
stat.xgboost()
set.seed(2024)
n=80; p=100; k=10; Ac = 1:k; Ic = (k+1):p
X = generate_X(n=n,p=p)
y <- generate_y(X, p_nn=k, a=3)
Xk = create.shrink_Gaussian(X = X, n_ko = 10)
res1 = knockoff.filter(X, y, Xk, statistic = stat.glmnet_coefdiff,
offset = 1, fdr = 0.1)
res1
#> Call:
#> knockoff.filter(X = X, y = y, Xk = Xk, statistic = stat.glmnet_coefdiff,
#> fdr = 0.1, offset = 1)
#>
#> Selected variables:
#> [1] 1 2 3 4 5 6 7 8 9 10
#>
#> Frequency of selected variables from 10 knockoff copys:
#> [1] 10 10 10 10 10 8 10 10 9 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
perf_eval(res1$shat,Ac,Ic)
#> [1] 1 0