Multivariate Distribution and Inference Applications

https://www.sciencedirect.com/science/article/pii/S0307904X21005291

Summary: The paper introduce a multivariate family of distributions for multivariate count data with excess zeros,which is a multivariate extension of univariate zero-inflated bell distribution. Model parameters are estimated using traditional maximum likelihood estimation. They also develop a EM algorithm to compute the MLE of parameters with closed-form expressions.

Univariate bell family's Probability Mass Function:

fBell(x;α):=Pr(X=x)=exp(1eW(α))W(α)xBxx!,x=0,1,f_{Bell}(x; \alpha) := Pr(X = x) = \exp(1-e^{W(\alpha)})\frac{W(\alpha)^{x}B_x}{x!}, x = 0,1,\dots

where α>0\alpha > 0, W(.)W(.) denote the principal branch of Lambert function, BxB_x are the bell numbers. We use XBell(α)X \sim Bell(\alpha)to denote this univariate distribution. Its properties:

  • Belongs to one-parameter exponential family of distributions

  • It is infinitely divisible

  • Variance larger than mean. E(X)=αE(X) = \alpha and Var(X)=α[1+W(α)]Var(X) = \alpha[1+W(\alpha)] and E(X)>Var(X)E(X) > Var(X)

Bell distribution might be suitable for data with overdispersion, unlike poisson.

Univariate Zero-inflated bell distribution. Let DD denote a degenerate distribution at constant cc i.e Pr(D=c)=1Pr(D = c) = 1, DDegenerate(c)D \sim Degenerate(c).

Let DDegenerate(0)D \sim Degenerate(0) and XBell(α)X \sim Bell(\alpha). Assume D D and XX are independent. Thus, PMF of ZIB can be expressed in the form of

fZIB(y;α,ω)={ω+(1ω)e1eW(α)y=0,(1ω)e1eW(α)W(α)yByy!y=1,2=[ω+(1ω)e1eW(α)]I(y=0)+[(1ω)e1eW(α)W(α)yByy!]I(y0)f_{ZIB}(y; \alpha, \omega) = \begin{cases} \omega + (1-\omega)e^{1-e^{W(\alpha)}} & y=0, \\ (1-\omega)e^{1-e^{W(\alpha)}}\frac{W(\alpha)^yB_y}{y!} & y=1,2\dots \\ \end{cases} \\ = [\omega+(1-\omega)e^{1-e^{W(\alpha)}}]\mathbb{I}(y=0)+[(1-\omega)e^{1-e^{W(\alpha)}}\frac{W(\alpha)^yB_y}{y!}]\mathbb{I}(y\neq0)

Where α>0,ω(0,1)\alpha > 0 , \omega \in (0, 1) and I(.)\mathbb{I}(.) denote indicator function. ZIB is a mix a distribution degenerate at zero with Bell distriution.

YZIB(α,ω)Y \sim ZIB(\alpha, \omega) admit following stochastic representation:

Y=DZX={0with probability ωXwith probability(1ω)Y \overset{D}= ZX = \begin{cases} 0 & \text{with probability }\omega\\ X & \text{with probability} (1-\omega)\\ \end{cases}

Where ZBernoulli(1ω)Z \sim Bernoulli(1-\omega), XBell(α)X \sim Bell(\alpha) and ZXZ\perp X. Random variables on both sides of the squality have the same distribution(Y, ZX). From above, it is immediate that E(Y)=E(Z)E(X)=(1ω)α,E(Y2)=E(Z)E(X2)=(1ω)α[1+α+W(α)]E(Y) = E(Z)E(X)= (1-\omega)\alpha, E(Y^2)=E(Z)E(X^2)=(1-\omega)\alpha[1+\alpha+W(\alpha)]

and VAR(Y)=(1ω)α[1+αω+W(α)]VAR(Y) = (1-\omega)\alpha[1+\alpha \omega + W(\alpha)].

Poisson can be effected by overdisperson due to zero-inflated dataset, thus bell distribution might be more suitable.

Multivariate ZIB distribution.

Model (to lazy to type all again since it was typed before)

Remark 2 PMF of YMZIB(α,ω)Y \sim MZIB(\alpha, \omega) can be expressed as f(y;α,ω)=ωPr(D=y)+(1ω)Pr(X=y)f(y;\alpha , \omega ) = \omega Pr(D = y) + (1-\omega )Pr(X = y), where D=(D1,,Dm)TD = (D_1, \dots, D_m)^T and {Dr}r=1miidDegenerate(0)\{D_r\}_{r=1}^m \overset{iid}\sim Degenerate(0).

Distributional Properties

From stochastic representation, we immediately obtain

E(Y)=(1ω)α,E(YYT)=(1ω)[diag{α}+diag{M}+ααT]VAR(Y)=(1ω)[diag{α}+diag{M}+ωααT]E(Y) = (1-\omega)\alpha , E(YY^T) = (1- \omega)[\text{diag} \{ \alpha \} + \text{diag} \{ M \} + \alpha \alpha^T] \\ VAR(Y) = (1-\omega)[\text{diag} \{\alpha\} + \text{diag}\{M\} + \omega \alpha \alpha^T]

Where M=(α1W(α1),,αmW(αm))TM = (\alpha_1W(\alpha_1), \dots , \alpha_mW(\alpha_m))^T and diag{α},α=(α1,,αm)T\text{diag}\{\alpha \}, \alpha = (\alpha_1, \dots, \alpha_m)^T denotes an m×mm \times m diagonal matrix with element αr(r=1,,m)\alpha_r (r= 1, \dots, m). We have that

CORR(Yj,Yk)=ω(αjαk)12[1+W(αj)+ωαj]12[1+W(αk)+ωαk]]12,jkCORR(Y_j, Y_k) = \omega (\alpha_j \alpha_k)^{\frac{1}{2}} \frac{[1 + W(\alpha_j) + \omega \alpha_j]^\frac{-1}{2}}{[1 + W(\alpha_k) + \omega \alpha_k]^]\frac{1}{2}} , j\neq k

If αj=αk=α\alpha_j = \alpha_k = \alpha, we have CORR(Yj,Y+k)=ωα[1+W(α)+ωα] for jkCORR(Y_j, Y+k) = \frac{\omega \alpha}{[1 + W(\alpha) + \omega \alpha]} \text{ for } j \neq k. Also ω0    CORR(Yj,Yk)0 for jk.\omega \rightarrow 0 \implies CORR(Y_j, Y_k) \rightarrow 0 \text{ for } j\neq k.

Let Tn(θ)T_n(\theta) be the n-th Touchard polynomial (see vocabulary) - correspond to n-th moment of Poisson distribution with parameter =θ>0 = \theta > 0. Let Bn(Z1,,Zn)B_n(Z_1, \dots, Z_n) denote the n-th compelte Bell polynomial. k-th moment of XBell(α)X \sim Bell(\alpha) is given by E(Xk)=Bk(k1,,kk)E(X^k) = B_k(k_1, \dots, k_k) where kk=eW(α)Tk(W(α))k_k = e^{W(\alpha)}T_k(W(\alpha)). Thus, mixed moments of YMZIB(α,ω) for k1,,kmNY \sim MZIB(\alpha, \omega) \text{ for } k_1, \dots, k_m \in \mathbb{N} are given by

E(r=1mYrkr)=(1ω)r=1mBkr(kk1,,kkr)E(\prod_{r=1}^m Y_r^{k_r}) = (1-\omega) \prod_{r=1}^mB_{k_r}(k_{k_1}, \dots, k_{k_r})

where kkr=eW(αr)Tkr(W(αr))k_{k_r} = e^{W(\alpha_r)}T_{k_r}(W(\alpha_r)). We then got the following propositions.

Proposition 2, 3
Corr3, Prop 4

I'm going to skip few propositions (there are just too many, see paper for all and their proof)

Parameter Estimation

Let Y1=(Y11,,Y1m)T,,Yn=(Yn1,,Ynm)TY_1 = (Y_{11}, \dots , Y_{1m})^T, \dots , Y_n = (Y_{n1} , \dots, Y_{nm})^T be nn indenpendent identically distribute random vectors such that each Yi=(Yi1,,Yim)TMZIB(α,ω) for i=1,,n.Y_i = (Y_{i1}, \dots , Y_{im})^T \sim MZIB(\alpha, \omega) \text{ for } i = 1, \dots, n.Let yi=(yi1,,yim)Ty_i = (y_{i1}, \dots , y_{im})^T be the realization of random vector Yi for i=1,,nY_i \text { for } i = 1, \dots, n. Ydata=[y1,,yn]TY_{data} = [y_1, \dots, y_n]^T be the n×mn \times m matrix that contains the observed multivariate data. Define I={i:y1=0m,i=1,,n}I = \{i : y_1 = 0_m, i= 1, \dots, n \} and let n0=i=1n(yi=0m)n_0 = \sum_{i=1}^n (y_i=0_m) be the number of elements in II, number of lines in matrix YdataY_{data} where all elements are equal to zero. Likelihood function eliminationg constants can be expressed as:

L(α,ω)=[ω+(1ω)eξ+]n0(1ω)nn0enn0ξ+r=1mW(αr)NrL(\alpha, \omega) = [\omega + (1-\omega)e^{\xi_+}]^{n_0}(1-\omega)^{n-n_0}e^{n-n_0\xi_+}\prod_{r=1}^mW(\alpha_r)^{N_r}

where Nr=iIyir=i=1nyirN_r = \sum_{i \notin I}y_{ir} = \sum_{i=1}^n y_{ir} and ξ+:=ξ+(α)\xi_{+}:=\xi_+(\alpha) and corresponding log-likelihood functions:

l(α,ω)=n0ln(ω+(1ω)eξ+)+(nn0)ln(1ω)+(nn0)ξ++r=1mNrln(W(αr))l(\alpha, \omega) = n_0\ln (\omega + (1-\omega)e^{\xi_+}) + (n-n_0)\ln (1-\omega) + (n-n_0)\xi_+ + \sum_{r=1}^m N_r \ln (W(\alpha_r))

Some assumptions on behavior of l(α,ω)l(\alpha, \omega) as nn \rightarrow \infty, regularity of first two derivatives with respect to model parameters and existence and uniqueness of ML estimate of α\alpha.

Direct Maximization

Can be from direct maximization of log-likelihood function.

First two derivatives

No-closed form of ML estimate, non-linear optimization algorithm.

Optimization Process

It is simpler to otpimize lprofile(α)l_{profile}(\alpha). - earlier is m dimensional later is (m+1) dimensional.

Fisher matrix - variance and covatiance matrix given by the inverse of it. See paper for details.

EM Algorithm

We partition II defined earlier into I=IeIsI = I_e \cup I_s , IeI_e denote the number of extra zeros corresponding to degenerate distribution at point zero, IsI_s denote the number of structural zero vectors corresponding to baseline Bell distribution.

Let VV be the latent variable that denotes number of elements in set IeI_e to split n0n_0 into VV and n0Vn_0 - V, V{0,1,,n}V\in \{0, 1, \dots, n\}.

Distribution

2 steps of EM - E-step: compute the conditional expectation of complete log-likelihood function given YdataY_{data}.

M-step: Maximization of the Q-function. (from step 1)

EM

Last updated