Quasi- testing

# Testing quasi-likelihood fitting with glmer

## Intro/preliminaries

The point of this example is to test how glmer fits quasi-likelihood models, and in particular to confirm that it does the wrong thing when scaling variances. Using lme4 version , built . Generate data with a simple structure:

• one block random effect (on intercept only)
• one continuous covariate (uniformly distributed)
• individual-level random effects (equivalent to drawing individual deviates from a lognormal-Poisson distribution, but I'll use a quasi-Poisson model (at least to start with) for the distribution of individual $j$ in block $i$:
(1)
\begin{split} Y_{ij} & \sim \mbox{Poisson}(\eta) \\ \log(\eta) & = a + b x + \epsilon_{B,i} + \epsilon_{I,ij} \\ \epsilon_{B,i} & \sim \mbox{Normal}(0,\sigma^2_B) \\ \epsilon_{I,ij} & \sim \mbox{Normal}(0,\sigma^2_I) \end{split}

Single simulation: default parameters are \# blocks = 10, \# samples/block = 50,
$\sigma^2_B=2$, $a=0$, $b=2$, lognormal-Poisson overdispersion with $\sigma^2_I=1$.

> set.seed(1001)
> D1 = simdata(nblock = 50)


Simulation returns a data frame with y (response value), x (covariate), f (block factor), i (individual ID), as well as eta (linear predictor value), zb (block effects).

## Fitting

Fit Poisson, quasi-Poisson, lognormal-Poisson (one may need a hacked version of glmer to do this, to overcome the error about too many random effects levels):

> set.seed(1002)
> sPois0 <- glmer(y ~ x + (1 | f), family = poisson, data = simdata(nblock = 50,
+     sd.ind = 0))
> set.seed(1002)
> sPois0X <- glmer(y ~ x + (1 | f), family = poisson, data = simdata(nblock = 50))
> set.seed(1002)
> sQP0 <- glmer(y ~ x + (1 | f), family = quasipoisson, data = simdata(nblock = 50))
> set.seed(1002)
> sLNP0 <- glmer(y ~ x + (1 | f) + (1 | i), family = poisson, data = simdata(nblock = 50))
> if (FALSE) {
+     sNB0 <- glmer(y ~ x + (1 | f) + (1 | i), family = negative.binomial(theta = 1),
+         data = simdata(nblock = 50))
+ }


Extracting summary statistics: coefficients, s.e. of coefficients, s.d. of block effect, scale parameter. Results:
        int     x sd.int  sd.x  sd.f scale
Pois  0.184 2.007  0.252 0.018 1.775 1.000
PoisX 0.687 1.941  0.249 0.015 1.757 1.000
QP    0.687 1.941  1.794 0.105 1.757 7.205
PoisX 0.255 1.919  0.251 0.083 1.736 1.000


In general the batch code looks like something like this:
> sPois <- t(replicate(400, getvals(glmer(y ~ x + (1 | f), family = poisson,
+     data = simdata(nblock = 50, sd.ind = 0)))))


• sPois: no overdispersion, Poisson fit
• sPoisX: LN-Poisson overdispersion ($\sigma^2_I=1$), Poisson fit (i.e. ignore overdispersion)
• sQP: LN-Poisson overdispersion ($\sigma^2_I=1$), quasi-Poisson fit
• sLNP: LN-Poisson overdispersion ($\sigma^2_I=1$), LN-Poisson fit

(many possibilities: overdisp = \{ none, LN-Poisson, "quasi-Poisson", neg binomial \} $\times$ estimation = \{ none, quasi, LN-Poisson \}) + Analyze results
Defined scalevals function (subtract true values from estimates and scale by estimated standard error - should give some sort of $t$-distributed thing, or in this large-data-set case, close enough to $N(0,1)$). Also defined coverage function to compute coverage (for the scaled values above, thie equals the proportion of runs where $|x|<1.96$).

> alldat <- list(Pois = sPois, PoisX = sPoisX, QP = sQP, LNP = sLNP)
> (ctab <- t(sapply(alldat, coverage)))
int      x
Pois  0.9600 0.9475
PoisX 0.5675 0.0975
QP    1.0000 0.4475
LNP   0.9450 0.9425


The coverage is low for PoisX (Poisson fit ignoring overdispersion). Coverage is fine for Pois (Poisson fit without overdispersion in the data) and LNP (lognormal-Poisson fit), but not great for QP (too conservative for intercept, not enough for slope). Everyone estimates the block variance correctly except for QP:

Another way of looking at bias in the estimates and accuracy of the standard error calculation (which we have already assessed via coverage): compute the mean and standard deviation of scaled estimates of intercept and slope. These should have means of zero, standard deviation of 1 (remember that I am centering on the true value before scaling).
      int.mean x.mean int.sd   x.sd
Pois     0.003  0.005  1.016  0.989
PoisX    1.798 -1.615  1.187 26.324
QP       0.290 -0.124  0.180  3.846
LNP      0.057  0.009  0.986  1.017


Pois and LNP are OK, but PoisX and QP are bad. (Could plot densities and compare with $N(0,1)$ if that felt worthwhile.)
Conclusions so far:
• QP estimates are incorrectly scaling the estimated block standard deviation by the scale parameter
• QP estimation is also messed up (somehow) in terms of bias and standard error estimation. I really don't understand where the bias comes from, since I thought that quasi-Poisson estimation was essentially doing exactly the same estimation procedure as Poisson estimation except that at the end it inflated all error estimates by the scale parameter (correctly in the case of parameter standard errors, incorrectly in the case of variance components?) Shouldn't the mean estimates for QP be identical to those for PoisX? Did I screw something up?

## To do

• Think about alternative error models:
• QP: What is the "scale" parameter associated with a lognormal-Poisson deviate with $\sigma=1$? (What is the variance-mean relationship of the lognormal-Poisson? Does it depend on $\lambda$?) Try fitting a model that really has QP errors (kluge with Gamma-distributed errors with appropriate variance/mean scaling)
• negative binomial: try generating data with NB, fitting with QP and LNP. Could (perhaps) fit this model by using the negative.binomial family definition from MASS within a loop, searching for the minimum deviance, but would have to hack glmer somewhat: at the moment it gives an error (unknown GLM family: ‘Negative Binomial(1)’) [this error is easy enough to override by adding negative.binomial to the list of allowed families, but there's a lot of other low-level hacking that would be required to implement the variance function …]
page revision: 4, last edited: 19 Feb 2009 03:58