Generate a knockoff variable set with spls regression

create.zpls.knockoff(
  X,
  ncomp = NULL,
  eta = 0,
  nb.list = NULL,
  threshold.abs = NULL,
  threshold.q = 0.9
)

Arguments

X

A numeric matrix or data frame. The original design data matrix with \(n\) observations as rows and \(p\) variables as columns.

ncomp

Optional. An integer specifying the number of components to use in the PLS regression. Default is 2.

eta

Optional. A numeric value between 0 and 1 specifying the sparsity level in the SPLS regression. Default is 0 (no sparsity).

nb.list

Optional. A list of length \(p\) or adjacency matrix of \(p \times p\) that defines the neighbourship of variables. A list of length \(p\) should include the neighbours' index of each variable from \(X_1\) to \(X_p\) in order; The \(i^{th}\) element in the list includes the indices of the neighbour variables of \(X_i\), or NULL when no neighbours. A adjacency matrix should be symmetric with only binary element and where \(M_{ij} = 1\) when \(X_i\) and \(X_j\) are neighbours; otherwise \(M_{ij} = 0\) when not neighbour or on diagonal (i.e. \(i = j\)). If not provided or NULL, the neighborhoods are determined based on correlations.

threshold.abs

Optional. A value between \(0\) and \(1\). A numeric value specifying an absolute correlation threshold to define neighborhoods.

threshold.q

Optional. A numeric value between 0 and 1 indicating the quantile of the correlation values to use as a threshold. Default is 0.9.

Value

A matrix of generated knockoff variables of \(n \times p\).

Details

Neighborhood Generation:

  • If threshold.abs is given: That absolute value is used directly.

  • If threshold.q is given: The threshold is set based on the quantile of the absolute correlation values.

  • If neither is provided: The function defaults to the 90th percentile of the absolute correlation values, which corresponds to using the strongest 10% of correlations to define neighborhoods. Calculate the fitted value of \(X_j\) with spls::spls, see spls::spls() for more details.

References

Yang, Guannan, et al. "PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection." bioRxiv (2024): 2024-08.

Examples

set.seed(10)
X <- matrix(rnorm(100), nrow = 10)
Xk <- create.zpls.knockoff(X = X, ncomp = 3,eta=0.3)