From the proposal:
Linear and generalized linear mixed models (LMM and GLMM). Classical statistics assumes that observations are independent and identically distributed. This assumption may lead to false results if observations or measurements are actually clustered in space, time or in the same subject, because this clustering is adding an unknown random portion to the variance of the response variable. The linear mixed effects model (LME) allows one to separate the effect of clustering on data variance by specifically identifying random factors that are often of no interest (or are not identifiable) to the researcher and the effect of fixed factors, those the researcher is really interested in. A generalized linear mixed model (GLMM) is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects, and the response variable has normal or other distributions from the exponential family. We illustrate in this chapter how LMM and GLMM can be used both to model heterogeneity and to model correlation – which are not necessarily different goals. One very useful feature of the LME is that one can incorporate, as part of the model, a function that represents a temporal or a spatial correlation structure (or both) for residuals. The LME may be viewed as a generalization of the variance component, randomized block and regression-covariance statistical linear models. We will show that with this battery of models one can investigate a wide range of questions addressed in ecology.
ROUGH outline/topics
- overview
- definition/meaning of random effects …
- distributions (reminder — should be well covered in GLM chapter)
- shrinkage
- G- vs R-side structures
- exponential family vs extensions (zero-inflation, neg binomial, Beta, etc.)
- graphical approaches/diagnostics
- fitting tools
- method of moments (classical)
- modern LMM/GLMM (deterministic)
- brute force (Laplace)
- Bayesian/stochastic (MCMC, data cloning)
- inference
- approximations/sources of error:
- quadratic log-likelihood surfaces (Wald, Hauck-Donner etc.: will this be in GLM chapter?)
- estimate of dispersion parameter (i.e. F tests/denominator df, etc.)
- non-Gaussian likelihood (Bartlett corrections etc.)
- solutions:
- Wald Z tests (assumes quad likelihood surface; Hauck-Donner etc.)
- Wald t tests (still assumes quad; accounts for uncertainty of dispersion parameter)
- conditional F tests (model comparison; doesn't assume quad), profile likelihood
- profile likelihood/LRT
- parametric bootstrap
- MCMC
- approximations/sources of error:
- extensions
*