create.pls.Rd
This function generates knockoff variables using Partial Least Squares (PLS) regression, following the PLSKO algorithm. It is useful for generating knockoff variables for high-dimensional data.
create.pls(
X,
n_ko = 1,
ncomp = NULL,
sparsity = 1,
nb.list = NULL,
threshold.abs = NULL,
threshold.q = 0.9,
verbose = FALSE
)
A numeric matrix or data frame. The original design data matrix with \(n\) observations as rows and \(p\) variables as columns.
An integer specifying the number of knockoff variables to generate. Default is 1.
Optional. An integer specifying the number of components to use in the PLS regression. Default is NULL
, in which case the number of components is chosen empirically.
Optional. A numeric value between 0 and 1 specifying the sparsity level in the PLS regression. Default is 1 (no sparsity).
Optional. A list of length \(p\) or an adjacency matrix of \(p \times p\) that defines the neighbor relationships among variables.
A list of length \(p\) should include the neighbors' indices of each variable from \(X_1\) to \(X_p\) in order. The \(i^{th}\) element in the list includes the indices of the neighbor variables of \(X_i\), or NULL
when no neighbors.
An adjacency matrix should be symmetric with binary elements. \(M_{ij} = 1\) indicates that \(X_i\) and \(X_j\) are neighbors; \(M_{ij} = 0\) indicates no neighbor relationship or diagonal entries.
If not provided or NULL
, neighborhoods are determined based on correlations.
Optional. A value between \(0\) and \(1\) to specify an absolute correlation threshold for defining neighborhoods.
Optional. A numeric value between 0 and 1 indicating the quantile of the correlation values to use as a threshold for defining neighborhoods. Default is 0.9.
Logical. Whether to display progress information during the knockoff generation. Default is TRUE.
A list of generated knockoff matrices, where each matrix has \(n\) rows (observations) and \(p\) columns (variables).
Yang, Guannan, et al. "PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection." bioRxiv (2024): 2024-08.
Other create:
create.fixed()
,
create.gaussian()
,
create.pc()
,
create.second_order()
,
create.seq()
,
create.shrink_Gaussian()
,
create.sparse_Gaussian()
,
create.sparse_seq()
,
create.zpls()