This function generates sparse sequential knockoff copies of the input data frame X. Sparse sequential knockoffs first estimate the adjacency matrix of X (which identifies the zeros/non-zeros in the precision matrix of X). Then, a modified sequential knockoff algorithm is applied where each regression includes only covariates that correspond to non-zero elements in the precision matrix of X. The use of a sparse model reduces the number of covariates per regression, improving efficiency.

create.sparse_seq(X, n_ko = 1, adjacency.matrix = NULL, verbose = TRUE)

Arguments

X

A data frame or tibble with numeric and factor columns only. The number of columns, ncol(X), must be greater than 2.

n_ko

Integer. The number of knockoff matrices to generate. Default is 1.

adjacency.matrix

Optional. A user-specified adjacency matrix (binary indicator matrix corresponding to the non-zero elements of the precision matrix of X). If not provided, it is estimated within the function.

verbose

Logical. Whether to display progress information during the knockoff generation. Default is TRUE.

Value

A list of data frames or tibbles, each being a sparse sequential knockoff copy of X, with the same type and dimensions as X.

Details

To enhance speed, least squares regression is used by default, unless the number of covariates exceeds half the number of observations (i.e., when p > n/2), in which case elastic net regularized regression is applied.

Examples

set.seed(1)
X <- generate_X(n = 100, p = 6, p_b = 2, cov_type = "cov_equi", rho = 0.5)
Xk <- create.sparse_seq(X)
#> 
#> Optimal tuning parameter on boundary... consider providing a smaller lam value or decreasing lam.min.ratio!-- Generating knockoff matrix 1 
#> # weights:  7 (6 variable)
#> initial  value 69.314718 
#> iter  10 value 47.823304
#> final  value 47.817075 
#> converged
#> # weights:  11 (10 variable)
#> initial  value 69.314718 
#> iter  10 value 51.798225
#> final  value 51.756322 
#> converged