The multiple regression model

The regression model assumes:

responses normally distributed with means $\mu$ and constant variance $\sigma^2$
mean response depends on linear combination of covariates

The relationship $\mu = \beta_0 + \beta_1 x_1 + \ldots + \beta_k x_k$ can be visualised as a plane, determined by the $\beta$ ’s. Data will be scattered above and below the plane. An alternative model than emphasises this is $Y = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k + \epsilon$ , where $\epsilon$ ~ N(0, $\sigma^2$ ).

How can we tell when this model is suitable, ie. planar?

$k=2$ , use spinner and look for edge
$k \ge 2$ , use coplots, and make sure all plots are parallel

Coefficients

Interpretation

$\beta_0$ = average response when all covariates are 0. $\beta_1$ = slope of plane in $x_1$ direction, average change in response for unit change in $x_1$ when all others held constant, slope of coplot of $y$ vs. $x_1$

Estimation

Minimise least squares criterion. Done in R using lm function.

How well does the plane fit?

Judge by looking at fitted values and residuals (difference between fitted and observed values). In a good fit, residuals will be small compared to the $y$ ’s and will be randomly distributed. For model to be useful, need strong relationship between response & explanatory variables.

How do we measure this? Think about predicting a new response.

If we don’t know the $x$ ’s, we’d predict $y^{*} = \bar{y}$ , with error $\frac{1}{n-1} \sum{(y_i - \bar{y})^2}$ . If we do know the $x$ ’s, then we’d predict $y^{*} = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \cdots + \hat{\beta}_k x_k$ , with error $\frac{1}{n-1}\sum(y_i - \hat{y}_i)^2$ = $ \frac{1}{n-1}\sum e_i^2$ .

The ratio of these prediction errors is $RSS / TSS$ where $RSS$ = residual sum of squares and $TSS$ = total sum of squares. The smaller this value is, the better the fit. RSS + RegSS = TSS, which implies RSS must lie between 0 and 1.

Another way is to look at $RegSS / TSS$ or $1 - RSS / TSS$ , this is known as the coefficient of determination, or $R^2$ . Clearly a big $R^2$ is desirable, 1 means a perfect fit, 0 means a flat plane.

$\sigma^2$ is also related to $R^2$ : more scatter equals poorer fit. $\sigma^2$ is estimated by $RSS / (n-k-1)$ .