Prototyping with tools

Next: Linear Models with Multiple Up: Linear Models Previous: Properties of   Index

Click for printer friendely version of this HowTo

## Hypothesis Testing

At this point we are ready to start evaluating our estimated parameters. For example, if a parameter is very close to zero, it could mean that the data scaled by is insignificant in our model. If all of the values for the parameters are similar to each other, it could mean that there is no difference between the different data sets that form X. This is where statistics comes in. Depending on the variability in our data, our hypothesis that is sufficiently close to zero to be insignificant, may or may not be rejected. Statistics tells us how close, based on the data, the parameter needs to be to zero for it to be close enough accept our hypothesis (or, how far does the parameter need to be from zero to reject our hypothesis).

The first thing we need to learn how to do is to convert our hypotheses into matrix notation so that we can use them to constrain our estimates of the parameters, just as we did when we created LRTs (see Section 3.10). The best way to learn this is to just see a few examples.

no_titleno_title

Example 3.13.5.2 (no_title)

If we want to test if is zero, then we set up the matrices (or, rather, vectors in this case):

to to be the constraint that we use when we try to estimate . The matrix on the left is generally called the contrast matrix, and we will refer to it with C.

no_titleno_title

Example 3.13.5.4 (no_title)

If our model had four parameters and we wanted to test to see if they were all equal3.16, we would use the following constraint.

If, given the data, there is a high probability that this system of equations is correct, then all of the parameters must all be equal.

In general our hypotheses will take the form

where C is an matrix where is the number of tests in our hypothesis and is the number of parameters we are estimating and is just a generalization of the vector of zeros that tend to end up right side of the constraints. It is usual for to contain only zeros but sometimes it can contain other numbers as well.

We can test our hypotheses by creating a LRT. That is, by comparing the unconstrained optimization of the least squares residual with a constrained optimization of the least squares residual. In Section 3.13.3 we solved for the unconstrained optimization and the result was Equation 3.13.6. To solve for the constrained optimization, that is

 (3.13.9)

we will use the method of Lagrange Multipliers (see Section 3.11 for details on this method). Thus, we want to find solutions to the equations:

 (3.13.10)

and

or

 (3.13.11)

Taking the gradient of Equation 3.13.10, we have

Thus, in terms of , the constrained optimization for is

 (3.13.12)

We can solve for by multiplying both sides of the equation by C and incorporating the constraint, Equation 3.13.11, into our solution. That is,

from Equation 3.13.11, and,

thus,

 (3.13.13)

We can substitute this solution into Equation 3.13.12 giving us the optimal solution for in terms of X, Y, and C only.

The LRT for the hypothesis is thus,

The constrained and unconstrained estimates for derived using least squares are equivalent to those found using maximum likelihood methods (see Section 3.9) when we assume that the elements of , are independent and normally distributed (just as we did when the derived the mean and variance for ). Intuitively this makes since because the normal distribution is unimodal and symmetric about the mean. Least squares attempts to position the mean of the distribution so that the distance from the data points and the mean is minimized, and thus, the data points will tend to be in the middle of the distribution where probability is highest. Maximum likelihood methods attempt to position the mean so that the data points have maximum probability of occurring, and thus, clustered around the mean. Analytically, this equivalence is easy to derive. Using the maximum likelihood method described in Section 3.9, we have:

Solving for gives us the same result we found using least squares. That is,

Using maximum likelihood to estimate under the constraint (Equation 3.13.11) results in the least square estimate, (Equation 3.13.14).

Now we need to solve for the unconstrained and constrained estimates of . We'll begin with solving for the unconstrained estimate, using maximum likelihood methods.3.17That is,

thus,

Solving for the constrained estimate, , is quite simple in that it amounts to substituting for in Equation 3.13.15. Thus,

 (3.13.14)

However, in order for it to be usable in a convenient way, we must simplify it and this is quite involved.

We'll start our simplification of by first noting that

Thus,

While this is formidable, the second and third terms equal zero3.18 and the last term reduces to3.19

 (3.13.15)

If we multiply Equation 3.13.17 by -1 twice, (that is, by 1), we get the equivalent and more common form,

Thus,

Now that we have MLEs for all of the parameters in the model, we can put together a likelihood ratio test.

If Equation 3.13.18 is smaller than some constant 3.20, then the estimated variance under our hypothesis, , would be much greater than the unconstrained estimated variance. Thus, we may infer that our hypothesis is much less accurate and less likely to be correct.

If we invert our test,

 (3.13.17)

then we can reject our hypothesis when it is greater than .

In order to determine what is, we will use the same methods that are used to determine constants for all other likelihood ratio tests (for example, the -test or the -test). What we will do is set up the inequality to define the rejection region

 (3.13.18)

and use algebra to modify both sides until we can recognize the form of the left side. If the form on the left side is a standard distribution, (and in this case, since we have a ratio of variances we'll end up with an -distribution), then we can use standard tables for this distribution to determine the value for the resulting . Thus,

All we have done so far is raise each side of the inequality to the power. It is important to note that at this point we are no longer interested in the value for but in . That is why we can substitute for . Throughout this derivation we will be modifying and simply replacing the old constant with a new one, , since we will only be interested in the value of which will eventually be determined by a standard distribution. This concept is fairly hard to grasp at first, but once you see the result, it may make more sense. Continuing with our derivation, we have,

and

 (3.13.19)

where is the number of tests (or rows in C), is the total number of observations (or rows in X or Y), and is the number of parameters we are estimating (or columns in X). We can show that Equation 3.13.21 is an -distribution 3.21 by showing that the numerator and the denominator are both times chi-square variables divided by their degrees of freedom. Intuitively, and are random variables with normal distributions in the numerator and denominator respectively. These appear as quadratic forms (that is, in the form ABA), and thus squares of these variables. The square of any random variable has a chi-square distribution. See Appendix D.7 for a more complete treatment of this proof.

Since we are able to show that the left hand side of Equation 3.13.21 has an -distribution, we can use a standard table to determine the value of . That is, , where is the probability we are willing to risk rejecting the hypothesis even when it is true.3.22

no_titleno_title

Example 3.13.5.6 (no_title)

Given the Leghorn data in Table 3.13.1 we can try to fit the model,

to the data. Thus, setting up our Y and X matrices we have,

Using the Octave program listed in Appendix A.1, which simply implements the test developed in Section 3.13.5, we can test different hypotheses.

To test whether or not body weight is an indicator of food consumption, that is, whether or not , we use the contrast matrix:

Our constraint for estimating the parameters is thus,

and the output from our program is:
octave:2> general_linear (x, y, [0, 1]);
beta =

55.2633
7.6901

f_test = 16.232
p_value = 0.0037939
t_test = 4.0289
Thus, since the p-value is much smaller than 0.05, we can conclude that and that body weight does indeed give an indication of food consumption.

ANOVAno_title

Example 3.13.5.8 (ANOVA)

Table: Study effect of different levels of PCB on body weights (in grams) of mice. Source: Plagiarized from Roger L. Berger, [3]
 0 ppm 62.5 ppm 250 ppm 1000 ppm 55 47 49 36 47 51 44 41 46 40 46 53 44 51 47

Given the data in Table 3.13.2,a modern ANOVA model3.23

is easily computed using Equation 3.13.21. We can set up our matrices as follows:

To test the hypothesis that there is no difference in the treatments (that is, all of the parameters are equal), we create the contrast matrix:

The constraint on our parameter estimation is thus,

Using our Octave program, we get:
> c = [1, 0, 0, -1;
> 0, 1, 0, -1; 0, 0, 1, -1]
c =

1   0   0  -1
0   1   0  -1
0   0   1  -1

> general_linear (x, y, c);
beta =

50.250
45.500
47.600
38.500

f_test = 4.3057
p_value = 0.030746
Thus, since our p-value is less than 0.05, we can reject our hypothesis that there is no difference in the treatment means.

Randomized Block Designno_title

Example 3.13.5.10 (Randomized Block Design)

Sometimes there are extra sources of variation in your data that you can clearly define. For example, in a laboratory, 5 different people may have run the same experiment and recorded their results. We could treat all of the experiments equally, ignoring the fact that we know that 5 different people were involved and just use standard ANOVA for hypothesis testing, or we could try to take into account the fact that some people may be more precise than others when making measurements and that some of the variation in the data may be due to this. The goal of this example is to demonstrate how this second option can work and how we can effectively make use of the extra information at hand.

Before we begin, however, we will note that this method only makes sense in specific situations and that, with a little more data, we can use much more powerfull methods for analysis. In this example we are assuming that it would be unreasonable to have each person run the experiment more than once. If this were not the case, we would apply the methods explained in Example 3.13.5.6.

Consider the data set in Table 3.13.3, where we have a single experiment to compare three different methods for drying leaves after they have been rinsed performed by five different people.

Table: Study of drying methods (measured in seconds till try) Source: Plagiarized from Steel, Torrie and Dickey, [2]. Data from Tucker et al.
 Person Applying Treatment (Block) Treatment 1 2 3 4 5 Control 950 887 897 850 975 Blotted 857 1,189 918 968 909 Air Current 917 1,072 975 930 954

The term Block refers to how randomization was used to set up the experiemental design. Here we are considering each person to represent a block and that within each block, the order in which the different drying methods is applied is randomized. That is to say, Person (Block) #1 may have used the blotting method first, then the control method and then the air current method. Person (Block) #2 may have started with the control method, then used the air current method and then the blotting method. This randomization within blocks is important to make sure that bias is not introduced to the data by where a treatment is applied. It could be that if one treatment was always done last, then it could have a bias due to the student wanting to get done quickly and go home for the evening.

For the data in Table 3.13.3, the model is:

where , , , , and , are indicator variables (dummy variables) for which block the sample came from and , and are indicator variables for which treatment was applied to the sample. Thus, it would make sense for Y and X to be:

and while Y is fine, X is a singular3.24matrix and we will not be able to invert X'X. A common solution to this problem is to use tricks that are often applied to traditional ANOVA models (ANOVA models that contain a general location parameter or overall mean and the remaining parameters are deviations from that mean) to reparameterize .

In order to reparameterize , we will consider there to be a common mean for all of the data, and that the effects of different blocks and treatments represent deviations from this mean. Thus, each data point . It is also important to note that due to Equation 3.13.7 (where we show that our estimates in are unbiased), the sum of the block variations and the sum of the treatment variations are equal to zero and that any particular block variation or treatment variation can be derived from the others. That is to say,

 (3.13.20)

and thus,

and

Since we can derive both and from the other deviations, we do not need to estimate values for them and as such, we can omit them from . By adding and removing and , has one fewer row than before and as a result, we can remove a column from X and it will become non-singular. Thus,

and design matrix then becomes

To test the hypothesis that there is no difference in the treatments is we want to know if the treatment deviations are all zero. We can create a contrast matrix to directly test to see if and are zero and if they are, by Equation 3.13.23, we can infer that is also zero. Thus,

Using Octave we get:
> general_linear (x, y, c);
beta =

949.867
-41.867
99.467
-19.867
-33.867
-38.067
18.333

f_test = 0.82474
p_value = 0.47244
and since the p-value is greater than 0.05, we will conclude that there is no difference in the three treatments. It is worth noting that if we had ignored the blocks and just used regular ANOVA, than the f-statistic would have been 0.71 and the p-value would have been 0.5128, demonstrating that by not ignoring the blocks, we have increased the precision of our test, even though we would have come to the same conclusion.

If we wanted to test the hypothesis that there was no difference in the different blocks (in this case this would mean that each person applied the treatment in more or less the same way), than we would have to test to see if the block deviations were all zero. Using the same logic used to create the contrast matrix for the treatment hypothesis we have:

Using Octave to calculate an F-statistic and a p-value, we get:
general_linear (x, y, c);
beta =

949.867
-41.867
99.467
-19.867
-33.867
-38.067
18.333

f_test = 1.5022
p_value = 0.28882
and once again, we would fail to reject our hypothesis and conclude that there is no significant difference between the blocks.

At this point it is worth noting that if we had more than one measurement per person/drying method combination, then we would not have had to do the reparameterization. With only one measurement per combination, we are forced to assume that there is no interaction between the person and the drying method. With more measurements, we do not have to make this assumption and we would have treated the data as we would a 5x3 factorial experiment. How this is done is shown in Examples 3.13.5.6, 3.13.5.7 and 3.13.5.8.

2x2 Factorialno_title

Example 3.13.5.12 (2x2 Factorial)

Table: Study of mouse learning times, testing both how maternal diet (calories derived from ethanol) and age might effect how many times a mouse repeats a test before passing. Source: Plagiarized from M. Plonsky. Analysis of Variance - Two Way http://www.uwsp.edu/psych/stat/13/anova-2w.htm
Age
0%
 5 4 3 4 2
 6 7 5 8 4
35%
 18 19 14 12 15
 6 9 5 9 3

Table 3.13.5: Average value for each factor combination.
 Age Maternal Diet Adolescent Mature 0% 3.6 6 35% 15.6 6.4

Sometimes you are interested in testing more than one variable at a time. For example, you might want to do this to determine if there are any interactions between two drugs at different dosages. Such an interaction might prevent the two drugs from being as effective together as they would be taken alone. A doctor might find this information helpful when deciding how to make prescriptions. In this case, you would need to use some type of factorial method to test your hypothesis. In this case, the the different drugs would be called Factors and the different dosages of each drug would be called Levels. If we are interested in only two different dosages per drug, then we would use what is called a 2x2 factorial method (2x2 indicating that there is a total of 4 different drug/dosage combinations). In general, factorial method can be used to determine if there is any interactions between the two factors and if there are none, it can then determine the significance of the effects of the different levels of the factors.

In a completely randomized experiement with multiple factors, there is no attempt to impose any type of blocking structure. That is to say, the randomization occurs at the top most level of organization (i.e. we randomly select a level from one factor and apply it with a randomly selected level from another factor). In a blocking design, we first designate the blocks and only randomize within the blocks.

The general model used for a 2x2 factorial method

where are indicator variables that show that the measured value came from a particular combination of factors.

Given the data in Table 3.13.4, we can apply the 2x2 factorial method to determine if there is any interaction between a mouse's age and how much prenatal ethanol its mother consumed when determining how quickly it can learn something new. We might speculate that older mice are going to be slow learners regardless of their prenatal environment, where as young mice might be heavily influenced this factor. We can further validate this hunch by creating what is called an interaction plot, Figure 3.13.3, by plotting the average responses for the adolescent and mature mice given the two different conditions (data from Table 3.13.5). In this case we see the two lines are not parallel and this indicates that the age of the mouse potentially changes the size of the effect that the prenatal environment has on its ability to learn.3.25

Since the model we will be using is a linear model, we can apply our general method for hypothesis testing. Thus, the matrices for Y and X are:

To test if there is interaction effects between the factors, and Figure 3.13.3 indicates that this is probably the case, we want to test if the slopes in the plot are equal. That is, and thus, we define the contrast matrix,

and our Octave program gives us the results:
> general_linear (x, y, c);
beta =

3.6000
15.6000
6.0000
6.4000

f_test = 35.598
p_value =  1.9738e-05
t_test = 5.9664
The tiny p-value confirms the intuition gained from Figure 3.13.3.

NxM Factorialno_title

Example 3.13.5.14 (NxM Factorial)

If there are more than two levels per factor, then the only thing that changes is the number of columns in the design matrix, to account for the larger number of possible combinations between factors, and the number of rows in the contrast matrix.

For example, if you were to do a 3x2 factorial, X would have 6 columns and in order to test for interactions, C would require 2 rows. The need for the extra row in C is illustrated in Figure 3.13.4, where the two groups of line segments would need to be tested to see if their slopes are equal.

2x2x2 Factorialno_title

Example 3.13.5.16 (2x2x2 Factorial)

By adding an additional factor, you add an extra dimension to types of interactions that can take place. Instead of a two dimensional table to describe the combinations of different levels of different factors, you now need a three dimensional cube. Additional factors require additional dimensions. Fortunately for us, however, all of these extra dimensions can be represented by multiple two dimensional tables.

Consider the data in Table 3.13.6. If we let Body Fat to be factor A, Gender to be factor B and Smoking History to be factor C, we can represent this data with the model:

where is 1 if the data came from that combination of factor levels.

Table: Study of how Body Fat, Smoking and Gender effect how long it takes to reach fatigue on an exercise bike. Source: Plagiarized from Neter et al., [10]
Low Body Fat
Smoking History
Gender Light Heavy
Male
 24.1 29.2 24.6
 17.6 18.8 23.2
Female
 20 21.9 17.6
 14.8 10.3 11.3

High Body Fat
Smoking History
Gender Light Heavy
Male
 14.6 15.3 12.3
 14.9 20.4 12.8
Female
 16.1 9.3 10.8
 10.1 14.4 6.1

Table 3.13.7: Average value for each Factor/Level combination.
Low Body Fat
Smoking History
Gender Light Heavy
Male
 25.967
 19.867
Female
 19.833
 12.133

High Body Fat
Smoking History
Gender Light Heavy
Male
 14.067
 16.033
Female
 12.067
 10.2

Now that we have the model, the question becomes, how do we test hypotheses about it. Can we use it to determine if Body Fat has an effect of its own on how quickly people get fatigued or is it confounded with the other factors? What about smoking? Do differences in how quickly someone is fatigued depend on all three factors working simultaneously? With carefully formulated contrast matrices we can answer all of these questions.

Before we start, however, it is important to understand that you need to begin by looking for the highest order interactions (in this case, the potential three way interaction between the three factors) before testing hypotheses about lower order interactions, which, in turn, need to be investigated before you look at whether or not any particular factor has a main effect (operates on its own). This is because if there are higher order interactions involving the specific factor you are interested in testing for main effects, than the data set is not usable for the kind of hypothesis you have in mind. Despite this, we begin by formulating the contrast matrices to test for main effects. The reason we do this, however, is that once we have the matrices needed to test for main effects, we can use them to derive all of the other contrast matrices required test for any possible higher order interactions. Thus, even though we start by creating the contrasts to test for main effects, we do not use them individually until the very end of the analysis if we use them at all.

Consider the data from Table 3.13.7 as it is plotted in Figure 3.13.5. If we assumed that there were no higher order interactions, we could test to determine if Body Fat has a main effect by testing to see if the slopes of the lines between Low and High were zero. We can do this by the usual way by calculating the slopes and then determining if they are significantly different from zero or not. Thus, the contrast matrix for the hypothesis that there is no main effect (the slopes are zero) is:

Similarly, to test for whether or not Gender has a main effect, we would want to test for whether or not the slopes in Figure 3.13.6 were zero or not. The hypothesis that there is no main effect is encoded into the contrast matrix:

Figure 3.13.7 implies that in order to test for a main effect from Smoking, we must define the contrast matrix as follows:

Now, all that we need to do to test hypothesis about any higher order interactions is multiply, column by column, the contrast matrices associated with each factor in the interaction. For example, to test for an interaction between Gender and Smoking, we create the new contrast matrix

This contrast matrix is equivalent to testing whether the slopes of the lines within each graph in Figure 3.13.6 are equal.

To test a hypothesis about a three way interaction between Body Fat, Gender and Smoking, we simply multiply the columns of all three main effect matrices together. This contrast matrix is

Intuitively, one can also imagine that this contrast matrix tests to determine if the slopes between the graphs in Figure 3.13.6 are equal (see Figure 3.13.8).

 1. 2.

As before, to test our hypotheses we create Y and X

Using our contrast matrix for testing for three way interactions between Body Fat, Gender and Smoking,

we can use our Octave program and the output is:
> general_linear(x, y, c);
beta =

25.967
19.867
19.833
12.133
14.067
16.033
12.067
10.200

f_test = 0.20036
p_value = 0.66043
t_test = 0.44761
Since the p-value is much larger than 0.05, we will fail to reject the hypothesis that there is no three way interaction between Body Fat, Gender and Smoking. That is to say, there is no three way interaction between the three factors in the study. Thus, we are free to test hypotheses about any possible two way interactions that might be present.

Moving on, we now test for interactions between Gender and Smoking. As described above, we define the contrast matrix:

and octave gives us:
> general_linear(x, y, c);
beta =

25.967
19.867
19.833
12.133
14.067
16.033
12.067
10.200

f_test = 1.1859
p_value = 0.29230
t_test = 1.0890
leaving us to conclude that, since the p-value is greater than 0.05, there are no interactions between Gender and Smoking.

To test if there are interactions between Body Fat and Smoking,

and octave gives us:
> general_linear(x, y, c);
beta =

25.967
19.867
19.833
12.133
14.067
16.033
12.067
10.200

f_test = 7.7612
p_value = 0.013221
t_test = 2.7859
so that we would reject the hypothesis that there is no interaction between Body Fat and Smoking.

To test for interactions between Gender and Body Fat,

and octave gives us:
> general_linear(x, y, c);
beta =

25.967
19.867
19.833
12.133
14.067
16.033
12.067
10.200

f_test = 1.4622
p_value = 0.24414
t_test = 1.2092
and we would fail to reject the hypothesis that there is no interaction between Gender and Body Fat.

Finally, since we have determined that Gender is the only factor that is not in a two interaction, we can determine if has a main effect. To test if the hypothesis is that there is no main effect, the contrast matrix is, as defined above,

and octave gives us:
> general_linear(x, y, c);
beta =

25.967
19.867
19.833
12.133
14.067
16.033
12.067
10.200

f_test = 18.915
p_value = 0.00049705
t_test = 4.3492
The small p-value tells us to reject our hypothesis and conclude that there is a main effect for Gender.

Blocked 2x3 Factorialno_title

Example 3.13.5.18 (Blocked 2x3 Factorial)

It is possible to combine blocking with the factorial method for analyzing data. Using blocking implies that you are assuming that some of the potential interactions do not exist while leaving room for the possibility with other interactions.

Consider the data in Table 3.13.8. Here we have two factors, Major and Grade, that have been separated into different blocks defined by Teacher (the measured value is Score). Notice that just like in Example 3.13.5.5, we only have one measurement for each cell in each block. This prevents us from using the full models from Examples 3.13.5.6 and 3.13.5.8 that allowed us to test for all possible interactions. Instead, we are forced to assume that there is no three way interaction between Teacher, Major and Grade. However, since we have multiple measurements for each Major/Grade combination, we can test for two way interactions between these two factors.

Table: Data for a 2x3 Factorial with Blocking. Teacher is used for blocking and Major and Grade are the two factors. The measured value is Score. Source: Plagiarized from Tanya Balan.
Teacher
(Block)
#1
 Grade Major Jr. Sr. Gr CS 80 80 80 MA 85 90 96

#2
 Grade Major Jr. Sr. Gr CS 75 80 84 MA 80 88 97

#3
 Grade Major Jr. Sr. Gr CS 75 80 85 MA 75 80 100

Given the restrictions of the data, the model is thus,

where , and are the indicator variables for Teacher, Major and Grade, respectively, and are the indicator variables for the different types of two way interactions.

As with our previous experience with blocking in Example 3.13.5.5, this equation will lead to a singular design matrix and thus we must reparameterize in much the same way we did before. The only significant difference is how we rewrite the interaction terms. Here, we will use the methods we learned in Example 3.13.5.8, where we built the interaction contrasts from the main effect contrasts. Instead of multiplying each column in contrast matrices, we will multiply the rows that determine which specific combination of Major and Grade a particular score is associated with to create the columns that specify the interaction. Thus, the design matrix is:

where Column 1 is the overall mean, , the Columns 2-3 characterize which Teacher, or block, the data comes from, Column 4 specifies which Major the data comes from, Columns 5-6 determines which Grade the data comes from and Columns 7-8, the product of Column 4 and Columns 5-6, specifies the (Major Grade) interaction.

As with the factorial method, we need to evaluate the presence of interaction before we determine the effects of the individual factors. To do this, we make the null hypothesis that there is no interaction. This is equivalent to assuming the parameters associated with Columns 7 and 8 are both equal to zero. The contrast matrix is thus,

Using our Octave program, we test our hypothesis:

> general_linear (x, y, c);
beta =

84.16667
1.00000
-0.16667
-4.27778
-5.83333
-0.33333
2.61111
0.44444

f_test = 5.3515
p_value = 0.026292
and, due to the small p-value, reject it.

Since we have concluded that there is indeed interactions between Major and Grade, we can end our inquiry here.

Split-plot Designno_title

Example 3.13.5.20 (Split-plot Design)

Split-plot designs are simply designs that incorporate multiple levels of blocking, and thus, only attempt to model a subset of the potential sources of variation. The more split'' you see in the name, like split-split-plot, of the design, the more levels of blocking there are in it. In this case, where we are just demonstrating split-plot, we let one factor determining the blocking at the top level, these blocks are then divided into two sub-blocks, or plots, with the levels of a second factor applied to each sub-block. These sub-blocks are then divided into sub-sub-blocks and the levels of a third factor are randomly distributed among these units. This is illustrated in Figure 3.13.9.
 1. 2. 3.
Conceptually, however, they do offer anything we have not seen before.

Consider the data in Table 3.13.9.

Table: Yield, in bushels per acre. There were six different fields (Blocks) and these blocks were divided into two sub-blocks, each sub-block receiving one of two different varieties of of soybean. Within each sub-block, five different row spacings were applied to the rows of soybeans. Source: Plagiarized from Steel, Torrie and Dickey, [2]. Data from J. W. Lambert, University of Minnesota
Field #1
 Row Spacing Variety 18 24 30 36 42 OM 33.6 31.1 33.0 28.4 31.4 B 28.0 23.7 23.5 25.0 25.7

Field #2
 Row Spacing Variety 18 24 30 36 42 OM 37.1 34.5 29.5 29.9 28.3 B 25.5 26.2 26.8 25.3 23.2

Field #3
 Row Spacing Variety 18 24 30 36 42 OM 34.1 30.5 29.2 31.6 28.9 B 28.3 27.0 24.9 25.6 23.4

Field #4
 Row Spacing Variety 18 24 30 36 42 OM 34.6 30.5 29.2 31.6 28.9 B 29.4 25.8 23.3 26.4 25.6

Field #5
 Row Spacing Variety 18 24 30 36 42 OM 35.4 30.7 30.7 28.1 18.5 B 27.3 26.8 21.4 24.6 24.5

Field #6
 Row Spacing Variety 18 24 30 36 42 OM 36.1 30.3 27.9 26.9 33.4 B 28.3 23.8 22.0 24.5 22.9

The table alone is enough to convince us that any hope we might have to model potential three way interactions is a lost cause. The fact that there is only one measurement per cell rules this out by not giving us enough degrees of freedom to model both a three way interaction and estimate the error. We are forced to simply assume that this type of interaction can not exist.

On the other hand, we can model a two way interaction between Field and Variety because we have multiple measurements per level of Variety in each Field. We can also model a two way interaction between Variety and Spacing because, across the different Fields, we can come up with several measurements for each Variety/Spacing combination. Thus, the model is:

 Field   Variety   Spacing FieldVarietyVarietySpacing

The more common form of this model is:

 Field   VarietyFieldVariety SpacingVarietySpacing

Since we do not have enough degrees of freedom to estimate all of the parameters in a full model (i.e. we are leaving out the three way interaction term) then we will have to reparameterize the design matrix in the same way we did in Example 3.13.5.9. This gives us a design matrix with sixty rows and twenty columns (thank goodness someone has already gone ahead and written a computer program to generate these for us, [5]). Since this is quite large, I will leave it to your imagination, however, a small bit of it can be found in Appendix B.1.

Analysis of the data follows that of any model that contains interaction terms. Start by analyzing those to determine if there is interaction. If not, then test for main effects. If there is, then, you have done your best.

The contrast matrix for testing for interaction between Variety and Spacing can be found in Appendix B.2 and the result of our test is:

> general_linear (x, y, c);
beta =

28.111667
0.228333
0.518333
0.238333
0.828333
-1.311667
2.821667
0.338333
0.408333
-0.311667
0.018333
-0.941667
3.363333
0.480000
-1.203333
-0.728333
0.853333
0.220000
0.436667
-0.671667

f_test = 1.3051
p_value = 0.28457
and the large p-value lets us conclude that there is no interaction between Variety and Spacing.

Since spacing is not tied up in another interaction term, like Variety, we can go ahead and test for whether or not it has a main effect. This contrast can be found in Appendix B.3 and octave tells us:

> general_linear (x, y, c);
beta =

28.111667
0.228333
0.518333
0.238333
0.828333
-1.311667
2.821667
0.338333
0.408333
-0.311667
0.018333
-0.941667
3.363333
0.480000
-1.203333
-0.728333
0.853333
0.220000
0.436667
-0.671667

f_test = 10.567
p_value =  6.1454e-06
Such a small p-value (far less than 0.05) allows us to conclude that Spacing does indeed have a main effect on the yield.

We will now test to see if the interaction term FieldVariety is significant. The contrast matrix for this can be found in Appendix B.4. The results of our test are:

> general_linear (x, y, c);
beta =

28.111667
0.228333
0.518333
0.238333
0.828333
-1.311667
2.821667
0.338333
0.408333
-0.311667
0.018333
-0.941667
3.363333
0.480000
-1.203333
-0.728333
0.853333
0.220000
0.436667
-0.671667

f_test = 0.61686
p_value = 0.68760
and once again, since the p-value is so large, we will fail to reject the hypothesis that the interaction term is insignificant.

Now that we have established that both Field and Variety are free of any interaction, we can test to see if they have any main effects. As we might expect Field ends up not having a main effect, but Variety does.

It is worth noting that often times statisticians will modify the formula used to calculate the F-statistic when they are testing for main effects for factors whose levels are not randomly applied within the sub-blocks. Instead of using the Mean Square Error (the estimation for the overall error in the model) in the denominator, they will use the mean square of BlockPlot interaction term. Doing so has the potential to increase the power of this test.

ANCOVAno_title

Example 3.13.5.22 (ANCOVA)

Analysis of covariance (ANCOVA) is very similar to ANOVA in that the models contain indicator variables for the various treatments. The differences comes from the fact that ANCOVA models also contain variables that represent continuous data, like weight or height. These continuous variables are called covariates because, even though they are not controlled by the experimenter, they are expected to have an influence on the response to the treatments. You can imagine how heavy and light people might have different response to different treatments'' of alcohol.

Oftentimes there are possible interactions between the covariates and the treatments. For example, if we were trying to determine if two different brands of fertilizer resulted in the same crop yield or not and we were measuring insect infestation as our covariate. We would expect that the greater the infestation, the less crop yield. Beyond this, it could be that one of the fertilizers contained something that insects enjoyed eating and thus, would be more susceptible more substantial infestations. It would then be difficult to tell if low crop yield for this fertilizer would be result of the fertilizer simply not working or due to the fact that the crop was eaten by the insects.

Thus, or analysis of covariance begins much the same way as it did for the blocking designs and the factorial data. We start by testing for interactions between the covariate and treatments. Using our example of two fertilizers and insect infestation, the model that leaves interaction as a possibility is,

 Fertilizer   Fertilizer Infestation   Infestation

where Fertilizer A and B are dummy 0/1 indicator variables exactly like the ANOVA model and Infestation A and B are infestation values measured for the different treatments (infestation is scored from 0 to 10, with 0 being no infestation and 10 being the most infestation).

Given 8 measurements for each treatment, the response vector and the design matrix are:3.26

We can test for interaction by testing whether the Infestation slopes are the same for the different fertilizers. That is,

and our general linear models program give us the results:
> general_linear (x, y, c);
beta =

17.86079
14.48320
-0.67178
-0.59096

f_test = 0.56192
p_value = 0.46793
t_test = 0.74961
Here the p-value is much larger than our 0.05 cut off, so that we will fail to reject our hypothesis that the slopes for the covariate given different treatments are different and thus, we can assume that there is no interaction between Infestation and Fertilizer.

Since we can ignore interaction, we can use a simpler model (one that leaves out the possibility for interaction) to test whether or not the fertilizers have the same effect on crop yield. This model simply lumps the two Infestation variables into one. We can do this because we failed to detect any difference between the two variables in the more complex model. Thus, our new model is:

 Fertilizer   Fertilizer   Infestation

and the design matrix becomes:

The contrast matrix for determining if there is a difference between the treatments is

and our program gives us the results:
> general_linear (x, y, c);
beta =

14.48603
11.51397
-0.62940

f_test = 60.644
p_value =  2.9992e-06
t_test = 7.7874
Such a small p-value causes us to reject the hypothesis that both Fertilizer treatments are the same.

NOTE: This second test, where we have lumped the covariate into a single variable for all of the treatments in the model and are testing for differences between the treatments, is equivalent to conducting the test under the adjusted treatment means hypothesis that is sometimes mentioned in other statistics texts.

Categorical Datano_title

Example 3.13.5.24 (Categorical Data)

Ever since 1969, linear models have been used to analyze categorical data [6].

Next: Linear Models with Multiple Up: Linear Models Previous: Properties of   Index

Click for printer friendely version of this HowTo

Frank Starmer 2004-05-19
>