This function generates knockoff variables using Partial Least Squares (PLS) regression, following the PLSKO algorithm. It is useful for generating knockoff variables for high-dimensional data.

create.pls(
  X,
  n_ko = 1,
  ncomp = NULL,
  sparsity = 1,
  nb.list = NULL,
  threshold.abs = NULL,
  threshold.q = 0.9,
  verbose = FALSE
)

Arguments

X

A numeric matrix or data frame. The original design data matrix with \(n\) observations as rows and \(p\) variables as columns.

n_ko

An integer specifying the number of knockoff variables to generate. Default is 1.

ncomp

Optional. An integer specifying the number of components to use in the PLS regression. Default is NULL, in which case the number of components is chosen empirically.

sparsity

Optional. A numeric value between 0 and 1 specifying the sparsity level in the PLS regression. Default is 1 (no sparsity).

nb.list

Optional. A list of length \(p\) or an adjacency matrix of \(p \times p\) that defines the neighbor relationships among variables.

  • A list of length \(p\) should include the neighbors' indices of each variable from \(X_1\) to \(X_p\) in order. The \(i^{th}\) element in the list includes the indices of the neighbor variables of \(X_i\), or NULL when no neighbors.

  • An adjacency matrix should be symmetric with binary elements. \(M_{ij} = 1\) indicates that \(X_i\) and \(X_j\) are neighbors; \(M_{ij} = 0\) indicates no neighbor relationship or diagonal entries.

  • If not provided or NULL, neighborhoods are determined based on correlations.

threshold.abs

Optional. A value between \(0\) and \(1\) to specify an absolute correlation threshold for defining neighborhoods.

threshold.q

Optional. A numeric value between 0 and 1 indicating the quantile of the correlation values to use as a threshold for defining neighborhoods. Default is 0.9.

verbose

Logical. Whether to display progress information during the knockoff generation. Default is TRUE.

Value

A list of generated knockoff matrices, where each matrix has \(n\) rows (observations) and \(p\) columns (variables).

References

Yang, Guannan, et al. "PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection." bioRxiv (2024): 2024-08.

Examples

set.seed(10)
X <- matrix(rnorm(100), nrow = 10)
Xk <- create.pls(X = X, ncomp = 3)