CAR-LASSO

https://arxiv.org/abs/2012.08397

Abstract

Need statistical models to denote interactions among microbes. The author proposed a chain graph model with two sets of nodes (predictors and responses) whose solution yields a graph with edges representing conditional dependence. The model uses Bayesian LASSO - the solution is sparse.

[R package](https://github.com/YunyiShen/CAR-LASSO)

Introduction

What was lacking in microbiome analysis: lack of statistical tools to simultaneously infer connections among microbes and their direct reactions to different environmental factors in a unified framework.

For the graphical model:

  1. Nodes represent variable

  2. Edges represent conditional dependence between nodes, the absence of such edges represents conditional independence.

Intuitively, a multiresponse linear regression with LASSO prior on regression coefficients combined with graphical LASSO prior on the precision matrix can provide sparse regression coefficients among responses and predictors. In addition, sparse graphical models can be used to estimate sparse graphical structure among responses. - However, predictors represent marginal effects rather than conditional effects.

Goals:

  • Estimate the graphical structure between predictors (environment) and responses (microbes).

  • The graphical structure among responses while keeping the conditional interpretation of both parameters.

CAR-LASSO simulatenoelously estimates the conditional effect of a set of predictors that influence the responses and connections among responses. The model is represented by a chain graph with two sets of nodes - {predictors}\{predictors\} and {responses}\{responses\}. And direct edges between predictor and response represent conditional links, and undirected edges among responses represent correlations.

  • Guarantees sparse solution - Bayesian LASSO.

  • Fixed penalty - posterior is log-concave.

  • Adaptive extension allows different shrinkage to different edges to incorporate edge-specific knowledge to model, and use Normal model to build hierarchical structures.

  • Handles small and big data through Gibbs sampling algorithm.

Methods

Let YiRk Y_i \in \mathbb{R}^k be multivariate reponse with kk entries for i=1,,n i = 1, \dots , n observations.

Let XiR1×pX_i \in \mathbb{R}^{1 \times p} be the row vector of predictors for i=1,,n i = 1, \dots , n, assume design matrix is standardized so each column has mean of 0 and same standard deviation.

Let YiY_i follow a normal distribution with mean vector Ω1(BTXiT+μ)\Omega^-1(B^TX_i^T + \mu) and precision matrix ΩRk×k\Omega \in \mathbb{R}^{k \times k} (positive definite) where BRp×kB \in \mathbb{R}^{p \times k} correspond to regression coefficients connecting the responses to predictors and μRk\mu \in \mathbb{R}^k correspond to the intercept. Author use transpose BTXiTRk×1B^TX_i^T \in \mathbb{R}^{k \times 1} because samples are encoded as row vectors in the design matrix while by convention multivariate Normal samples are column vectors.

The likelihood function of the model is:

p(YiXi,μ,B,Ω)exp[(BTXiT+μ)TYi12YiTΩYi]p(Y_i|X_i, \mu, B, \Omega) \propto \exp[(B^TX_i^T+\mu)^TY_i - \frac{1}{2}Y_i^T\Omega Y_i]

B B encodes the conditional dependence between YY and XX. If Bjq=0B_{jq} = 0, then XjX_j and YqY_q are conditionally independent. Ω\Omega off diagonal entried encode the conditional dependence between YqY_q and YqY_{q^\prime}. The regression coefficients in multiresponse linear regression, B~=BΩ\tilde{B} = B\Omega are marginal effects.

Prior Specification:

Assume Laplace prior on B B and GLASSO prior on Ω\Omega

Full Model Specification

Note IΩM+I_{\Omega \in M^+} means Ω\Omega must be positive definite.

Algorithm

Algorithm for Gibbs Sampling

hyperparameters: we determine λB\lambda_\Beta and λΩ\lambda_\Omega.

Learning:

Continuous prior     \implieszero probability for parameter to be zero.

Amount of shrinkage, 1π1 - \pi where π=θ~Eg(θY)\pi = \frac{\tilde{\theta}}{E_g(\theta|Y)} , θ~\tilde{\theta} is the estimate of parameter under LASSO prior and denominator is the posterior mean of parameter under non-shrinkage prior. Use π>0.5\pi > 0.5 to decide that θ0\theta \neq 0.

Extensions

Adaptive lasso - prior knowledge of independence among certain nodes.

Examples

Example on human gut dataset.

CG-LASSO's conditional effects can be more informative than marginal effects.

Discussion

Conditional Dependence is important - distinguishing between marginal effects and conditional effects. Bayesian model allows for an easier extension to different types of responses. It is also hard to learn graphical structure - confounding in its own structure.

Last updated