The zKnock package

Introduction

In this vignette we demonstrate the main functionalities of the knockofftools package. In particular, we demonstrate functions for generating data sets, simulating knockoffs (MX and sequential), applying the multiple knockoff filter for variable selection and visualizing selections.

Let’s first recall how the knockoff variable selection methodology works in a nutshell:

Simulate a knockoff copy $\tilde{X}$ of the original covariates data $X$ .
Compute feature statistics $W_j=|\beta_j|-|\tilde{\beta}_j|$ from an aggregated regression of $Y$ on $X$ and $\tilde{X}$ . Large, positive statistics $W_j$ indicate association of $X_j$ with $Y$ .
For FDR control use the knockoffs $+$ procedure to select variables $j$ that fulfill $W_j \geq \tau_+$ where $\tau_+ = \underset{t>0}{\operatorname{argmin}} \left\{\frac{1 + |\{j : W_j \leq t\}|}{|\{j : W_j \leq t\}|} \leq q\right\}.$ This workflow selects variables associated with response with guaranteed control of false discovery rate $FDR \leq q$ .

library(zKnock)
#> Warning: replacing previous import 'spls::spls' by 'mixOmics::spls' when
#> loading 'zKnock'
#> Warning: replacing previous import 'mixOmics::spls' by 'spls::spls' when
#> loading 'zKnock'

October 14, 2024

Introduction

Data generation

References