Trondheim mini-symposium

Notes for a workshop here.

Schedule

  • introductions (problem descriptions?): $\approx$30 min
  • data manipulation and visualization overview: 30 min
  • data manip/vis lab (??), R troubleshooting: 1 hour
  • break
  • GLMM outline lecture: 1 hour
  • GLMM lab: the rest of the morning
  • lunch
  • work on projects!
  • present projects (??)

Data manipulation and visualization

Data manipulation

  • Basic ideas: metadata and formats
  • (Big data, spatial data: specialized tools (RDBMS, GIS): not covered but see Spatial task view, High-performance computing task view, R Data manual)
  • Wide vs long vs relational table format (multitable package)
  • Data frames
  • Data manipulation in R
    • Alternative paradigms
      • Programming paradigm: [, [[, indexing
      • Logical (high-level) paradigm: subset, transform, with, merge (not attach)
    • apply operations
      • general-purpose vs special purpose (colMeans etc.); performance, readability benefit vs. remembering more specific tools
    • plyr package: especially ddply
    • Dates and times: not covered but see chron, lubridate, Datetime … always use the least complex date/time format possible

Data visualization

  • Three goals:
    • exploratory: understand patterns in data (efficiently, nonparametrically)
    • diagnostic: understand departures from model (nonparametrically)
    • presentation: tell a story (elegantly but honestly: trying to match the statistical model used, but not always)
  • Tools
    • for graphics generally
      • Base R (programming/canvas paradigm): mish-mosh of functions at different levels of abstraction (plot; lines; segments, rect, arrows, text). Many, many, many extensions (plotrix package).
      • grid graphics: a low-level toolbox which can be used on its own. Object- rather than canvas-oriented. Murrell book.
      • lattice: built on grid, much higher-level
      • ggplot(2): also built on grid, similar to lattice but even higher-level (and weird) (Wilkinson, Wickham books).
      • weird/advanced (not covered): dynamic/3D/alternative: rgl, rggobi, playwith, animation
    • for diagnostics:
      • model methods (e.g. plot.lm), more flexible versions in lme (xyplot.lme
      • or write your own
      • fortify in ggplot
    • Mixed model-focused approaches:
      • Pooled and fixed-effect (grouped) models

GLMMs

Reminder about GLMs

  • families (i.e. distributions): Poisson, binomial, neg binom (1 and 2), Gamma, lognormal …
  • overdispersion

Reminder about mixed models

  • "fixed" vs "random"
  • nested vs crossed

Estimation

  • PQL: glmmPQL
  • Laplace/AGHQ: lme4, glmmADMB
  • MCMC: MCMCglmm

Inference

  • Wald
  • LRTs (profiles)
  • Posterior densities
  • IC-based approaches
    • conditional AIC
  • Inference on RE variances
  • Parametric bootstrap

Challenges

  • Estimation glitches
    • Convergence problems
    • Zero variance estimates/Perfect correlations
  • $p$ values: finite-size corrections
  • Spatial structure
  • Temporal structure
  • Phylogenetic/pedigree structure
  • Nonlinearity (GAMs)
  • Ordinal data
  • Zero-inflation

Data files etc.

Raw materials

  • Harvard Forest, UWO, Concordia talks
  • Banta examples
  • other worked examples on this site (Owls, glycera, Culcita)
  • sparrows & moose examples
  • Vonesh seed predation example (lab 2 from the Book)
  • examples at NCEAS site: wildflowers??
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License