Summary: The paper introduce a multivariate family of distributions for multivariate count data with excess zeros,which is a multivariate extension of univariate zero-inflated bell distribution. Model parameters are estimated using traditional maximum likelihood estimation. They also develop a EM algorithm to compute the MLE of parameters with closed-form expressions.
Univariate bell family's Probability Mass Function:
fBell(x;α):=Pr(X=x)=exp(1−eW(α))x!W(α)xBx,x=0,1,… where α>0, W(.) denote the principal branch of Lambert function, Bx are the bell numbers. We use X∼Bell(α)to denote this univariate distribution. Its properties:
Belongs to one-parameter exponential family of distributions
It is infinitely divisible
Variance larger than mean. E(X)=α and Var(X)=α[1+W(α)] and E(X)>Var(X)
Bell distribution might be suitable for data with overdispersion, unlike poisson.
Univariate Zero-inflated bell distribution. Let D denote a degenerate distribution at constant c i.e Pr(D=c)=1, D∼Degenerate(c).
Let D∼Degenerate(0) and X∼Bell(α). Assume D and X are independent. Thus, PMF of ZIB can be expressed in the form of
fZIB(y;α,ω)={ω+(1−ω)e1−eW(α)(1−ω)e1−eW(α)y!W(α)yByy=0,y=1,2…=[ω+(1−ω)e1−eW(α)]I(y=0)+[(1−ω)e1−eW(α)y!W(α)yBy]I(y=0) Where α>0,ω∈(0,1) and I(.) denote indicator function. ZIB is a mix a distribution degenerate at zero with Bell distriution.
Y∼ZIB(α,ω) admit following stochastic representation:
Y=DZX={0Xwith probability ωwith probability(1−ω) Where Z∼Bernoulli(1−ω), X∼Bell(α) and Z⊥X. Random variables on both sides of the squality have the same distribution(Y, ZX). From above, it is immediate that E(Y)=E(Z)E(X)=(1−ω)α,E(Y2)=E(Z)E(X2)=(1−ω)α[1+α+W(α)]
and VAR(Y)=(1−ω)α[1+αω+W(α)].
Poisson can be effected by overdisperson due to zero-inflated dataset, thus bell distribution might be more suitable.
Multivariate ZIB distribution.
Model (to lazy to type all again since it was typed before) Remark 2 PMF of Y∼MZIB(α,ω) can be expressed as f(y;α,ω)=ωPr(D=y)+(1−ω)Pr(X=y), where D=(D1,…,Dm)T and {Dr}r=1m∼iidDegenerate(0).
Distributional Properties
From stochastic representation, we immediately obtain
E(Y)=(1−ω)α,E(YYT)=(1−ω)[diag{α}+diag{M}+ααT]VAR(Y)=(1−ω)[diag{α}+diag{M}+ωααT] Where M=(α1W(α1),…,αmW(αm))T and diag{α},α=(α1,…,αm)T denotes an m×m diagonal matrix with element αr(r=1,…,m). We have that
CORR(Yj,Yk)=ω(αjαk)21[1+W(αk)+ωαk]]21[1+W(αj)+ωαj]2−1,j=k If αj=αk=α, we have CORR(Yj,Y+k)=[1+W(α)+ωα]ωα for j=k. Also ω→0⟹CORR(Yj,Yk)→0 for j=k.
Let Tn(θ) be the n-th Touchard polynomial (see vocabulary) - correspond to n-th moment of Poisson distribution with parameter =θ>0. Let Bn(Z1,…,Zn) denote the n-th compelte Bell polynomial. k-th moment of X∼Bell(α) is given by E(Xk)=Bk(k1,…,kk) where kk=eW(α)Tk(W(α)). Thus, mixed moments of Y∼MZIB(α,ω) for k1,…,km∈N are given by
E(r=1∏mYrkr)=(1−ω)r=1∏mBkr(kk1,…,kkr) where kkr=eW(αr)Tkr(W(αr)). We then got the following propositions.
I'm going to skip few propositions (there are just too many, see paper for all and their proof)
Parameter Estimation
Let Y1=(Y11,…,Y1m)T,…,Yn=(Yn1,…,Ynm)T be n indenpendent identically distribute random vectors such that each Yi=(Yi1,…,Yim)T∼MZIB(α,ω) for i=1,…,n.Let yi=(yi1,…,yim)T be the realization of random vector Yi for i=1,…,n. Ydata=[y1,…,yn]T be the n×m matrix that contains the observed multivariate data. Define I={i:y1=0m,i=1,…,n} and let n0=∑i=1n(yi=0m) be the number of elements in I, number of lines in matrix Ydata where all elements are equal to zero. Likelihood function eliminationg constants can be expressed as:
L(α,ω)=[ω+(1−ω)eξ+]n0(1−ω)n−n0en−n0ξ+r=1∏mW(αr)Nr where Nr=∑i∈/Iyir=∑i=1nyir and ξ+:=ξ+(α) and corresponding log-likelihood functions:
l(α,ω)=n0ln(ω+(1−ω)eξ+)+(n−n0)ln(1−ω)+(n−n0)ξ++r=1∑mNrln(W(αr)) Some assumptions on behavior of l(α,ω) as n→∞, regularity of first two derivatives with respect to model parameters and existence and uniqueness of ML estimate of α.
Direct Maximization
Can be from direct maximization of log-likelihood function.
No-closed form of ML estimate, non-linear optimization algorithm.
It is simpler to otpimize lprofile(α). - earlier is m dimensional later is (m+1) dimensional.
Fisher matrix - variance and covatiance matrix given by the inverse of it. See paper for details.
EM Algorithm
We partition I defined earlier into I=Ie∪Is , Ie denote the number of extra zeros corresponding to degenerate distribution at point zero, Is denote the number of structural zero vectors corresponding to baseline Bell distribution.
Let V be the latent variable that denotes number of elements in set Ie to split n0 into V and n0−V, V∈{0,1,…,n}.
2 steps of EM - E-step: compute the conditional expectation of complete log-likelihood function given Ydata.
M-step: Maximization of the Q-function. (from step 1)