This function runs the model-Y Knockoffs procedure from start to finish, selecting important responses.

YKnock_filter(
  X,
  Y,
  knockoffs = create_YKnock,
  statistic = stat_modelY_coef,
  fdr = 0.1,
  offset = 1
)

Arguments

X

n-by-p matrix or data frame of predictors.

Y

n-by-r matrix or data frame of responses.

knockoffs

method used to construct knockoffs for the \(Y\) variables. It must be a function taking a n-by-r matrix as input and returning a n-by-r matrix of knockoff variables. By default, approximate model-Y Gaussian knockoffs are used.

statistic

statistics used to assess variable importance. By default, a lasso statistic with cross-validation is used. See the Details section for more information.

fdr

target false discovery rate (default: 0.1).

offset

either 0 or 1 (default: 1). This is the offset used to compute the rejection threshold on the statistics. The value 1 yields a slightly more conservative procedure ("knockoffs+") that controls the false discovery rate (FDR) according to the usual definition, while an offset of 0 controls a modified FDR.

Value

An object of class "YKnock". This object is a list containing at least the following components:

Y

matrix of original responses

Yk

matrix of knockoff responses

statistic

computed test statistics

threshold

computed selection threshold

selected

named vector of selected responses

Details

This function creates the knockoffs, computes the importance statistics, and selects responses. It is the main entry point for the YKnock package.

The parameter knockoffs controls how knockoff variables are created. By default, the model-Y scenario is assumed and a multivariate normal distribution is fitted to the original variables \(Y\).

The default importance statistic is stat_modelY_coef.

It is possible to provide custom functions for the knockoff constructions or the importance statistics. Some examples can be found in the vignette.

References

Identification of Significant Gene Expression Changes in Multiple Perturbation Experiments using Knockoffs

Tingting Zhao, Guangyu Zhu, Patrick Flaherty bioRxiv 2021.10.18.464822;

https://www.biorxiv.org/content/10.1101/2021.10.18.464822v1

Examples

r = 100; p = 10; n = 40
m = 10  # num of important response
rho = 0.2
betaValue = 1
SigmaX=matrix(rho,p,p)
diag(SigmaX)=1
betaA=matrix(sample(c(-1,1)*betaValue,size=m*p, replace = TRUE), nrow = m,ncol = p)
beta=matrix(0,r,p)
beta[1:m,]=betaA
sigma=1
X = matrix(rnorm(n*p),n)%*%chol(SigmaX)
Y = X %*% t(beta) + sqrt(sigma)*matrix(rnorm(n*r),n,r)
set.seed(1)
result=YKnock_filter(X, Y)
print(result$selected)
#> integer(0)