Trondheim mini-symposium
Notes for a workshop here.
Schedule
- introductions (problem descriptions?): $\approx$30 min
- data manipulation and visualization overview: 30 min
- data manip/vis lab (??), R troubleshooting: 1 hour
- break
- GLMM outline lecture: 1 hour
- GLMM lab: the rest of the morning
- lunch
- work on projects!
- present projects (??)
Data manipulation and visualization
Data manipulation
- Basic ideas: metadata and formats
- (Big data, spatial data: specialized tools (RDBMS, GIS): not covered but see Spatial task view, High-performance computing task view, R Data manual)
- Wide vs long vs relational table format (multitable package)
- Data frames
- Data manipulation in R
- Alternative paradigms
- Programming paradigm: [, [[, indexing
- Logical (high-level) paradigm: subset, transform, with, merge (not attach)
- apply operations
- general-purpose vs special purpose (colMeans etc.); performance, readability benefit vs. remembering more specific tools
- plyr package: especially ddply
- Dates and times: not covered but see chron, lubridate, Datetime … always use the least complex date/time format possible
- Alternative paradigms
Data visualization
- Three goals:
- exploratory: understand patterns in data (efficiently, nonparametrically)
- diagnostic: understand departures from model (nonparametrically)
- presentation: tell a story (elegantly but honestly: trying to match the statistical model used, but not always)
- Tools
- for graphics generally
- Base R (programming/canvas paradigm): mish-mosh of functions at different levels of abstraction (plot; lines; segments, rect, arrows, text). Many, many, many extensions (plotrix package).
- grid graphics: a low-level toolbox which can be used on its own. Object- rather than canvas-oriented. Murrell book.
- lattice: built on grid, much higher-level
- ggplot(2): also built on grid, similar to lattice but even higher-level (and weird) (Wilkinson, Wickham books).
- weird/advanced (not covered): dynamic/3D/alternative: rgl, rggobi, playwith, animation
- for diagnostics:
- model methods (e.g. plot.lm), more flexible versions in lme (xyplot.lme
- or write your own
- fortify in ggplot
- Mixed model-focused approaches:
- Pooled and fixed-effect (grouped) models
- for graphics generally
GLMMs
Reminder about GLMs
- families (i.e. distributions): Poisson, binomial, neg binom (1 and 2), Gamma, lognormal …
- overdispersion
Reminder about mixed models
- "fixed" vs "random"
- nested vs crossed
Estimation
- PQL: glmmPQL
- Laplace/AGHQ: lme4, glmmADMB
- MCMC: MCMCglmm
Inference
- Wald
- LRTs (profiles)
- Posterior densities
- IC-based approaches
- conditional AIC
- Inference on RE variances
- Parametric bootstrap
Challenges
- Estimation glitches
- Convergence problems
- Zero variance estimates/Perfect correlations
- $p$ values: finite-size corrections
- Spatial structure
- Temporal structure
- Phylogenetic/pedigree structure
- Nonlinearity (GAMs)
- Ordinal data
- Zero-inflation
Data files etc.
- Banta_trondheim.pdf, Banta_trondheim.R, Banta_trondheim.Rnw
- Banta_TotalFruits.csv
- glmm_funs.R
- Banta_glmmADMB_fits.RData
- Banta_MCMCglmm_fit.RData
- glmmADMB_0.6.4.tar.gz
Raw materials
- Harvard Forest, UWO, Concordia talks
- Banta examples
- other worked examples on this site (Owls, glycera, Culcita)
- sparrows & moose examples
- Vonesh seed predation example (lab 2 from the Book)
- examples at NCEAS site: wildflowers??
page revision: 22, last edited: 31 Aug 2011 06:58