Create PC Knockoffs A sequential algorithm to create non-parametric knockoffs based on principal component regression and residuals permutation.

create.pc.knockoff(X, pc.num)

Arguments

X

An input original design matrix.

pc.num

The number of pricial components to be used for generating knockoff matrices.

Value

A principal component knockoff matrix.

Details

For each original variable \(\mathbf{x}_j\), where \(j = 1, \ldots, p\), the following steps are performed to generate knockoff variables:

  1. Conduct PCA on the matrix \(\left(\mathbf{X}_{-j}, \mathbf{Z}_{1: j-1}\right)\).

  2. For a fixed \(K\), fit \(\mathbf{x}_j\) on \(K\) PCs. There is a tradeoff in that the larger the \(K\), the more akin the knockoff will be to the original variables. This results in a smaller type 1 error but weaker power of the test.

  3. Compute a residual vector \(\varepsilon_n=\left(\mathbf{x}_j-\hat{\mathbf{x}}_j\right)\).

  4. Permute \(\varepsilon_n\) randomly. Denote the permuted vector as \(\varepsilon_n^*\).

  5. Set \(\mathbf{z}_j=\hat{\mathbf{x}}_j+\varepsilon_n^*\) and combine it with the current knockoff matrix \(\mathbf{Z}_{1: j-1}\).

References

Jiang, Tao, Yuanyuan Li, and Alison A. Motsinger-Reif. "Knockoff boosted tree for model-free variable selection." Bioinformatics 37.7 (2021): 976-983.

Shen,A. et al. (2019) False discovery rate control in cancer biomarker selection using knockoffs. Cancers, 11, 744.

Examples

set.seed(10)
X <- matrix(rnorm(100), nrow = 10)
Xk <- create.pc.knockoff(X = X, pc.num = 5)