Factors

Up to now have used numerical variables, but can use categorical too, called factors. Regression will assign a value to each level of the factor, equivalent to specifying a different intercept for each level. Can use interaction terms (:) to specify that the model should use different slopes as well.

Anova models

Regression models where all explanatory variables are categorical. Assume response normally distributed around mean determined by factor levels. Other standard assumptions (equal variances) still apply. Can write in offset or dummy variable forms. R expects first factor level to be baseline, so may need to rearrange level order.

F-test tests for equality of means. HSD plot (HSD.plot(lm())) displays differences graphically.

Two-factor models

Usually want to split each cell-mean into:

Use plot.design(df) to look at level means, interaction.plot(factor1, factor2, explanatory) to look at interactions, anova(lm) for formal significance tests.

Models with many continuous and categorical variables

General approach is to fit separate regressions to each continuous variable for each combination of categorical variables. Will produce many possible models, and need variable selection techniques. (Remember: need low order interaction for high level, if not enough data, drop high level interactions)