Generate knockoff variables using sparse partial least squares (SPLS) regression

create.zpls(
  X,
  n_ko = 1,
  ncomp = NULL,
  eta = 0,
  nb.list = NULL,
  threshold.abs = NULL,
  threshold.q = 0.9,
  verbose = FALSE
)

Arguments

X

A numeric matrix or data frame. The design matrix with \(n\) observations as rows and \(p\) variables as columns.

n_ko

Integer. The number of knockoff copies to generate. Default is 1.

ncomp

Optional. Integer specifying the number of components to use in the SPLS regression. Default is 2.

eta

Optional. Numeric value between 0 and 1 specifying the sparsity level in the SPLS regression. Default is 0 (no sparsity).

nb.list

Optional. A list of length \(p\) or a \(p \times p\) adjacency matrix defining the neighborhoods of variables. If not provided or NULL, neighborhoods are determined based on correlations.

threshold.abs

Optional. A numeric value between \(0\) and \(1\) specifying an absolute correlation threshold to define neighborhoods.

threshold.q

Optional. A numeric value between \(0\) and \(1\) indicating the quantile of the correlation values to use as a threshold. Default is 0.9.

verbose

Logical. Whether to display progress information during the knockoff generation. Default is TRUE.

Value

A matrix of generated knockoff variables of dimensions \(n \times p\).

Details

Knockoff variables are generated by fitting an SPLS regression model for each variable based on its neighborhood.

Neighborhood Generation:

  • If threshold.abs is given, the absolute correlation threshold is used directly.

  • If threshold.q is given, neighborhoods are determined based on the quantile of the absolute correlation values.

  • If neither is provided, the function defaults to the 90th percentile of the absolute correlation values.

SPLS Regression: The fitted values of each variable \(X_j\) are calculated using the spls::spls function. See spls for more details.

Examples

set.seed(10)
X <- matrix(rnorm(100), nrow = 10)
Xk <- create.zpls(X = X, ncomp = 3, eta = 0.3)