# Introduction

In principle, a "gold standard" for estimating $p$ values for the significance of a more complex model relative to a simpler (nested) one is (1) fit the simpler model to the data; (2a) simulate new values from the simpler values many (e.g. $N=1000$) times; (2b) each time, fit the more complex model to the simulated data and extract summary statistics (e.g. $t$ or $Z$ statistics for the parameters of interest); (3) estimate $p$ values or critical values from the null distribution of simulated statistics, e.g. the frequency with which $|t_{\mbox{obs}}|>|t_{\mbox{sim}}|$. Pinheiro and Bates 1 do this to illustrate the issue of boundary effects, and the `simulate.lme` method in the `nlme` package is set up to run precisely this procedure: see `?simulate.lme`. This procedure will generally be very slow, and it would be better to have workable approximations for $p$ values (or to dispense with them entirely, if your editor/reviewers/supervisor/co-authors will let you …) There are two problems with the analogous procedure for GLMMs fitted with [`g`]`lmer` in the `lme4` package: (1) a `simulate` method is not implemented for GLMM fits; and (2) in the case of quasilikelihood fits, it's not even clear how to simulate "quasi" data - how does one generate data with the right mean-variance relationship that make sense?

# Simulating from quasi- distributions

To answer the second question first: there are (at least) two different ways to do this, neither feels entirely satisfactory, neither is well tested. (From a simulation point of view we would be better off with compound distributions, either or negative binomial or lognormal-Poisson for counts [logitnormal-Beta or beta-binomial] for proportions.)

**Method 1**: generate data from a Gamma distribution with parameters adjusted to give the appropriate mean-variance relationship. That is, for shape parameter $a$ and scale parameter $s$, and quasi-Poisson parameters $\lambda$ (mean) and $\phi$ (overdispersion parameter), knowing that $\mbox{mean} = as$ and $\mbox{var}= as^2$ gives us (by the method of moments) $a=\mbox{var}/\mbox{mean}=\lambda \phi/\lambda=\phi$ and $s=\mbox{mean}^2/\mbox{var} = \lambda^2/(\lambda \phi) = \lambda/\phi$. The only other question is whether to round the values, which will make them more "realistic" but will mess up the mean and variance.

```
> rqpois <- function(n, lambda, phi, roundvals = FALSE) {
+ sc <- phi
+ sh <- lambda/phi
+ r <- rgamma(n, shape = sh, scale = sc)
+ if (roundvals)
+ round(r)
+ else r
+ }
```

**Method 2:** Alternatively, one can use a negative binomial with $k$ (overdispersion parameter) depending on the mean. For the NB, $\mbox{mean}=\mu$, $\mbox{var}=\mu(1+\mu/k)$. For the QP, $\mbox{var}=\lambda \phi$, $\mbox{mean}=\lambda$. So $\lambda=\mu$ and $\phi \mu = \mu(1+\mu/k)$ or $k=\mu/(\phi\mu-1)$. Therefore, this approach will only work if $\phi \mu >1$, which may be a problematic restriction …

```
> rqpois2 <- function(n, lambda, phi, min.k = 0.001, roundvals = FALSE) {
+ mu <- lambda
+ k <- mu/(phi * mu - 1)
+ if (any(k < min.k))
+ warning("some overdispersion parameters rounded up to",
+ min.k)
+ k <- pmax(min.k, k)
+ r <- rnbinom(n, mu = mu, size = k)
+ }
```

# Simulations

Read in a file containing (among other things) `my.mer.sim`, a drop-in replacement for the `simulate` method in `lme4` that can handle GLMMs.

`> source("glmmfuns.R")`

Now to test this out. First simulate some Poisson and some binomially distributed "data" (the

`eta1`variable, which is the value on the linear predictor scale plus some error variance, will serve as a test of normally distributed data):

```
> set.seed(1001)
> b_eff = rnorm(10,0,sd=0.1) ## block effects
> a = 1 ## intercept
> b = 1 ## slope
> zz = expand.grid(block=1:10,x=1:10,rep=1:10)
> zz <- within(zz,
+ { eta0 <- a+b*x+b_eff[block]
+ eta1 <- rnorm(length(eta0),eta0,0.1)
+ y <- rpois(length(eta0),exp(eta0))
+ })
> zz2 <- within(zz,
+ { eta0 <- a+b*x+b_eff[block]
+ eta1 <- rnorm(length(eta0),eta0,0.1)
+ y <- rbinom(length(eta0),prob=plogis(eta0),size=20)
+ })
```

Now fit these models. First, check that

`my.mer.sim`is in fact identical to the existing

`simulate`method for LMM fits:

```
> library(lme4)
> m0 <- lmer(eta1 ~ x + (1 | block), data = zz)
> s0 <- simulate(m0, seed = 1001)
> s1 <- my.mer.sim(m0, seed = 1001)
> all(abs(s0 - s1) < 1e-10)
[1] TRUE
```

```
> m3 <- lmer(cbind(y, 20 - y) ~ x + (1 | block), data = zz2, family = "binomial")
> s3 <- my.mer.sim(m3)
```

Refit a single simulation to check that we get similar results:

```
> m1 <- lmer(y~x+(1|block),data=zz,family="poisson")
> ## ignore warning about false convergence for now ...
> s2 <- my.mer.sim(m1)
> m2 <- refit(m1,s2)
> fixef(m2)
(Intercept) x
1.010175 1.000481
> fixef(m1)
(Intercept) x
0.9734259 1.0000034
```

# Null model: refitting the same model

Now try some replicates:

`> zzz = replicate(100, refit(m1, my.mer.sim(m1)))`

Extract coefficients from re-fits:

```
> library(reshape)
> coefs <- t(sapply(zzz, fixef))
```

Plot distribution of estimated coefficients:

`> print(densityplot(~value | X2, data = melt(coefs), scales = list(relation = "free")))`

Observed vs. expected (note variance increase, cluster effects):

```
> plot(jitter(s2), jitter(zz$y))
> lines(lowess(zz$y ~ s2))
```

Estimated critical values:

```
> quantile(coefs[, 1], c(0.025, 0.975))
2.5% 97.5%
0.864403 1.069991
> quantile(coefs[, 2], c(0.025, 0.975))
2.5% 97.5%
0.9992113 1.0004216
```

Estimated $p$-values:

```
> sum(abs(fixef(m1)[1]) > abs(coefs[, 1]))/nrow(coefs)
[1] 0.48
> sum(abs(fixef(m1)[2]) > abs(coefs[, 2]))/nrow(coefs)
[1] 0.51
```

As expected in this null case, they are not small.

# Nested model

Now try the procedure described above - fit the more complex model to values simulated from the null (less complex) model:

```
> m0 <- lmer(y ~ (1 | block), data = zz, family = "poisson")
> zzz0 = replicate(100, lmer(y ~ x + (1 | block), family = "poisson",
+ data = data.frame(zz[, c("block", "x")], y = my.mer.sim(m0))))
> coefs <- t(sapply(zzz0, fixef))
```

Estimated critical values for slope:

```
> quantile(coefs[, 2], c(0.025, 0.975))
2.5% 97.5%
-0.0002035154 0.0002062753
```

Estimated $p$-values:

```
> sum(abs(fixef(m1)[2]) < abs(coefs[, 2]))/nrow(coefs)
[1] 0
```