### 8.4    Significance Testing

Differs from hypothesis testing in only one way:  in both, assume hypothesis true; then:
• Hypothesis Test:  specify a rejection region (for desired significance level) at outset; see if our test statistic falls into this
• Significance Test:  compute the probability the value of our test statistic will be as extreme as (or worse than) the observed value if the null hypothesis were true; called the P-value. Reject the null hypothesis if the P-value is small enough.
ex:
Democratic example from previous section; suppose that we take our sample of 20 students, and find that X = 5 of them are Democrats. The P-value for this sample is obtained by computing the probability that we'd get a sample with this few Democrats in it, or even fewer, if the null hypothesis were true. Thus we assume that p >= .40 (that 40% or more of students at large are Democrats), and find the probability that X <= 5. Using  p = .40, the distribution of X would be a binomial distribution with n = 20 and p = .4; then we find that in this case,   P(X <= 5)  =  .1256.   If we used any value of p greater than .40, the probability would be even less than this. Thus we use as our P-value   P = .1256:   this is the probability that we'd get a sample "this bad, or even worse," just due to sampling variation. Thus there's a 13% chance we'd get a sample with 5 or fewer Democrats in it if in fact the percentage of Democrats is 40%. We now have to decide if this probability is small enough for us to be able to justify rejecting the null hypothesis.

What's really the difference between a hypothesis test and a significance test? In the hypothesis test, we decide on the criterion for rejecting the null hypothesis before the sample is taken; if the results of the sample don't meet our criteria, we don't reject the hypothesis. In a significance test, we sample first, and look at the result, and decide if this is unlikely enough for us to conclude that the null hypothesis is incorrect.

Which is better? Both are used, but statisticians prefer the approach used in the hypothesis test, as a way to "keep themselves honest."

• If we choose a significance level of .01 in designing a hypothesis test, and then compute our rejection region based on this, we will then only reject the null hypothesis if the sample we get is one of the "1% of worst samples" that would occur if the null hypothesis is true. If we get a sample that is close to our rejection region, but not in it, we don't reject!
• In the significance test, we look at the probability that a sample would be "as bad as or worse than" the one we got; if this probability is low, we reject the null hypothesis. But since we haven't specified a clear-cut criterion for rejection, we can sometimes "talk ourselves into" rejecting the hypothesis. For example, the P-value for a sample might come out to be .02; thus there's only a 2% chance we'd get a sample this bad if the null hypothesis is true, and we might be inclined to reject the hypothesis. However, this sample would not have fallen into the rejection region for the hypothesis test with significance level .01, since its P-value is a little larger than the significance level. Thus we would not have permitted ourselves to reject, having previously decided on our criterion.

### Hypothesis and Significance Tests on the Population Mean

Frequently, our hypothesis will concern the value of the mean of a population.

ex:

Consider the machine filling popcorn boxes discussed in an example in section 7.4, in which the mean fill (in ounces) was adjustable but the standard deviation arose from a built-in tolerance and was known to be s = .3 ounces. Suppose the machine is supposed to be set so that the mean fill is at least 14.0 ounces of popcorn per box, but we suspect that it has gone out of adjustment and the mean fill is now less than 14 ounces.
Our hypotheses would be:
H0 : m = 14.0
H1 : m < 14.0
Hypothesis Test
Suppose we want to test our hypothesis with a hypothesis test at significance level a = .05 (so that there will only be a 5% chance we'll conclude that the machine is out of adjustment when it is in fact OK).

To test, we'll take a sample of 50 boxes, and look at the sample mean; if  is sufficiently less than 14.0, we'll conclude that the machine is out of adjustment and that the mean fill is indeed less than 14.0 ounces.

Rejection region: how far below 14.0 should  be for us to reject H0?
Consider:

• Assuming that the weights of boxes are normally distributed,  will also be normally distributed. (Since n is large (n = 50),  will be normally distributed even if the weights aren't normally distributed, by the Central Limit Theorem.)
• Thus
will have the standard normal distribution; using the null hypothesis, that m = 14.0 ounces, and the known value of s = .3 ounces, this becomes
.
• We want the Z value such that only 5% of Z-values would be below this value just due to random chance; this value is the critical value  -z.05 = -1.645.

• Thus if the null hypothesis is true and m = 14.0, only 5% of samples will have
• Solving for , we find that if H0 is true, for only 5% of samples will
just due to sampling variation. If we get a sample with  at or below this level, it's more likely that the null hypothesis isn't true and that the machine is out of adjustment.
• This gives us our rejection region; reject  if our sample yields  <= 13.93
Suppose we now take our sample of 50 popcorn boxes, and find that  = 13.88 ounces. Then by the above, since the value of  lies in our rejection region, we would reject the null hypothesis that the machine is adjusted properly to give a mean fill of 14.0 ounces, and accept instead the alternate hypothesis that the mean fill is set below this level.

Significance Test
To test the hypothesis via a significance test, we would not bother to figure out a rejection region; we'd just take a sample, look at the value of  obtained, and compute its P-value to see if it's low enough for us to reject the null hypothesis.

To compute the P-value for the sample above, with  = 13.88 ounces, we need to find the probability that we'd get a sample with a mean this low or lower just by chance if the null hypothesis is in fact true. Thus we want to find   P( <= 13.88),   assuming that the population mean is m = 14.0.
Using the fact that  is normally distributed, we use the Z distribution to compute this probability:

Thus there's only a 2% chance we'd get a sample with a mean this low or lower if the null hypothesis were in fact true; since this is quite unlikely, we would reject the null hypothesis.

Previous section  Next section