Generate a knockoff variable set with PLSKO using PLS regression

create.pls.knockoff(
  X,
  nb.list = NULL,
  threshold.abs = NULL,
  threshold.q = 0.9,
  ncomp = NULL,
  sparsity = 1
)

Arguments

X

A numeric matrix or data frame. The original design data matrix with \(n\) observations as rows and \(p\) variables as columns.

nb.list

Optional. A list of length \(p\) or adjacency matrix of \(p \times p\) that defines the neighbourship of variables. A list of length \(p\) should include the neighbours' index of each variable from \(X_1\) to \(X_p\) in order; The \(i^{th}\) element in the list includes the indices of the neighbour variables of \(X_i\), or NULL when no neighbours. A adjacency matrix should be symmetric with only binary element and where \(M_{ij} = 1\) when \(X_i\) and \(X_j\) are neighbours; otherwise \(M_{ij} = 0\) when not neighbour or on diagonal (i.e. \(i = j\)). If not provided or NULL, the neighborhoods are determined based on correlations.

threshold.abs

Optional. A value between \(0\) and \(1\). A numeric value specifying an absolute correlation threshold to define neighborhoods.

threshold.q

Optional. A numeric value between 0 and 1 indicating the quantile of the correlation values to use as a threshold. Default is 0.9.

ncomp

Optional. An integer specifying the number of components to use in the PLS regression. Default is 2.

sparsity

Optional. A numeric value between 0 and 1 specifying the sparsity level in the PLS regression. Default is 1 (no sparsity).

Value

A matrix of generated knockoff variables of \(n \times p\).

References

Yang, Guannan, et al. "PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection." bioRxiv (2024): 2024-08.

Examples

set.seed(10)
X <- matrix(rnorm(100), nrow = 10)
Xk <- create.pls.knockoff(X = X, ncomp = 3)