Introduction
Statistical models summarise relationships between variables. Regression models focus on one variable (response) and how it can be modelled by other explanatory variables. In general, response = function of explanatory variables + random scatter. We need to describe function and scatter. Balance between simplicity and accuracy.
Our approach:
- explore data using graphics and summary statistics
- construct useful model (iterative)
- use model to gain knowledge about system
- communicate findings
Role of graphics
- data cleaning: locate outliers, gross errors, special codings etc.
- exploration: what type of model could be useful?
- diagnostics: what is wrong with our model?
Data cleaning
All real data contain errors. Data should be carefully checked prior to analysis, but suitable plots are likely to reveal gross errors.