Processing math: 100%

Generate a knockoff variable set with PLSKO using PLS regression

create.pls.knockoff(
  X,
  nb.list = NULL,
  threshold.abs = NULL,
  threshold.q = 0.9,
  ncomp = NULL,
  sparsity = 1
)

Arguments

X

A numeric matrix or data frame. The original design data matrix with n observations as rows and p variables as columns.

nb.list

Optional. A list of length p or adjacency matrix of p×p that defines the neighbourship of variables. A list of length p should include the neighbours' index of each variable from X1 to Xp in order; The ith element in the list includes the indices of the neighbour variables of Xi, or NULL when no neighbours. A adjacency matrix should be symmetric with only binary element and where Mij=1 when Xi and Xj are neighbours; otherwise Mij=0 when not neighbour or on diagonal (i.e. i=j). If not provided or NULL, the neighborhoods are determined based on correlations.

threshold.abs

Optional. A value between 0 and 1. A numeric value specifying an absolute correlation threshold to define neighborhoods.

threshold.q

Optional. A numeric value between 0 and 1 indicating the quantile of the correlation values to use as a threshold. Default is 0.9.

ncomp

Optional. An integer specifying the number of components to use in the PLS regression. Default is 2.

sparsity

Optional. A numeric value between 0 and 1 specifying the sparsity level in the PLS regression. Default is 1 (no sparsity).

Value

A matrix of generated knockoff variables of n×p.

References

Yang, Guannan, et al. "PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection." bioRxiv (2024): 2024-08.

Examples

set.seed(10)
X <- matrix(rnorm(100), nrow = 10)
Xk <- create.pls.knockoff(X = X, ncomp = 3)