next up previous index
Next: Setting up Y and Up: Linear Models Previous: Linear Models   Index

Click for printer friendely version of this HowTo


General Overview

Let's say that you are studying a type of chicken and you have reason to believe that its weight will give you some indication of how much food it will eat in a year (a fairly reasonable thing to suspect). Ideally we would like to eventually have some sort of function that we could use hen weight for input and the result would be an estimate of how much feed we might expect it to consume.

So, you go out and weigh a hen and it turns out to weigh 4.6 units and consumes 87.1 units. From this single data point, it would be impossible to tell if a hen that weighed more would eat more (which would be what we suspected) or would eat less. Thus, we go out an collect another data point. This time the hen weighs 5.1 units and eats 93.1 units. If we assumed that there was some sort of linear relationship between the hen's weight and the amount of feed it consumes, then we could use the two data points to solve for the unknown parameters in our model, using them to solve for an intercept (which we will call $ \beta_0$) a slope (which we will call $ \beta_1$). Thus, using the following two equations

$\displaystyle 87.1$ $\displaystyle = \beta_0 + \beta_1(4.6)$    
$\displaystyle 93.1$ $\displaystyle = \beta_0 + \beta_1(5.1)$    

and standard algebraic techniques, we can determine that $ \beta_0 =
31.9$ and $ \beta_0 = 12$. Thus our model is:

$\displaystyle f(x) = 31.9 + 12x.$ (3.13.1)


Table: Average body weight $ X$ and food consumption $ Y$ for 50 hens from each of 10 White Leghorn strains (350-day period). Source: Plagiarized from Steel, Torrie and Dickey [2]. Data from S. C. King, Purdue University
Body weight, $ X$ Food Consumption, $ Y$
4.6 87.1
5.1 93.1
4.8 89.8
4.4 91.4
5.9 99.5
4.7 92.1
5.1 95.5
5.2 99.3
4.9 93.4
5.1 94.4


After measuring several more points (Table 3.13.1) you realize that none of them, except for the first two, which were used to create the model, fall on the line defined by $ f(x)$ (See Figure 3.13.1).

Figure: A plot of the Leghorn data from Table 3.13.1 with a line drawn using the first two points to define the slope and the intercept (Equation 3.13.1). Notice how poorly this line estimates the other data points. For example, with a single exception, the estimates made by Equation 3.13.1 are low. Compare this with the graph shown in Figure 3.13.2.
\includegraphics[width=3in]{leghorn_bad}

Figure: A plot of the Leghorn data from Table 3.13.1 with a line drawn using Least Squares to estimate the slope and the intercept. Notice how even though this line passes through fewer points than Equation 3.13.1, shown in Figure 3.13.1, it tends to closer to the majority of the data.
\includegraphics[width=3in]{leghorn_good}

At this point we might realize that it was fairly arbitrary to decide to use the first two points to create our model. We could have used the second and the third or the fourth and fifth, but using any specify pair of points to define our model doesn't make it any less arbitrary. What we would really like to do is use all of the data that we have collected to create our model. Since it is obvious that all of the data does not fall on a single line3.9 we would like to create our model in such a way that the difference between the points that the model predicts and the observed data is minimized (see Figure 3.13.2). This section concerns itself with describing both a method for creating models that achieve this called Least Squares, and a means to evaluate the the properties of these models. This method works well with a wide range of data (not just simple $ (x, y)$ pairs) and this will be seen in the examples.3.10

Least squares is a method for estimating parameters for linear functions (or, in more technical jargon, functions that are linear with respect to its coefficients3.11) such that they minimize the sum of squares of differences between the $ y$-values of the data points and the corresponding $ y$-values of the approximating function.

We start by considering a linear model of the form

$\displaystyle y_i = \beta_{0} + \beta_{1}x_{i,1} \cdots + \beta_{m}x_{i,m} + \epsilon_i,$ (3.13.2)

where $ i = 1,\ldots, n$ is the number of observations. This system of $ n$ equations can be written in matrix notation quite concisely with,

$\displaystyle \mathbf {Y = X \boldsymbol{\beta} + \boldsymbol{\epsilon}},$ (3.13.3)

where Y, called the dependent variable, is an $ n \times 1$ vector of observed measurements3.12, $ \boldsymbol{\beta}$ is an $ 1 \times m$ vector of unknown model parameters, X, called the independent variables or the design matrix, is an $ n \times m$ matrix of independent variable values and $ \boldsymbol{\epsilon}$ is the measurement noise.3.13


next up previous index
Next: Setting up Y and Up: Linear Models Previous: Linear Models   Index

Click for printer friendely version of this HowTo

Frank Starmer 2004-05-19
>