Introduction

In this vignette we demonstrate the main functionalities of the knockofftools package. In particular, we demonstrate functions for generating data sets, simulating knockoffs (MX and sequential), applying the multiple knockoff filter for variable selection and visualizing selections.

Let’s first recall how the knockoff variable selection methodology works in a nutshell:

  1. Simulate a knockoff copy X̃\tilde{X} of the original covariates data XX.
  2. Compute feature statistics Wj=|βj||β̃j|W_j=|\beta_j|-|\tilde{\beta}_j| from an aggregated regression of YY on XX and X̃\tilde{X}. Large, positive statistics WjW_j indicate association of XjX_j with YY.
  3. For FDR control use the knockoffs++ procedure to select variables jj that fulfill Wjτ+W_j \geq \tau_+ where τ+=argmint>0{1+|{j:Wjt}||{j:Wjt}|q}. \tau_+ = \underset{t>0}{\operatorname{argmin}} \left\{\frac{1 + |\{j : W_j \leq t\}|}{|\{j : W_j \leq t\}|} \leq q\right\}. This workflow selects variables associated with response with guaranteed control of false discovery rate FDRqFDR \leq q.
library(zKnock)
#> Warning: replacing previous import 'spls::spls' by 'mixOmics::spls' when
#> loading 'zKnock'
#> Warning: replacing previous import 'mixOmics::spls' by 'spls::spls' when
#> loading 'zKnock'

Data generation

References