8.4 Significance Testing
Differs from hypothesis testing in only one way: in both, assume
hypothesis true; then:
Hypothesis Test: specify a rejection region (for desired significance
level) at outset; see if our test statistic falls into this
Significance Test: compute the probability the value of our test
statistic will be as extreme as (or worse than) the observed value if the
null hypothesis were true; called the P-value. Reject the null hypothesis
if the P-value is small enough.
Democratic example from previous section; suppose that we take our
sample of 20 students, and find that X = 5 of them are Democrats. The P-value
for this sample is obtained by computing the probability that we'd get
a sample with this few Democrats in it, or even fewer, if the null hypothesis
were true. Thus we assume that p >= .40 (that 40% or more of students at
large are Democrats), and find the probability that X <= 5. Using
p = .40, the distribution of X would be a binomial distribution with n
= 20 and p = .4; then we find that in this case, P(X <=
5) = .1256. If we used any value of p greater than
.40, the probability would be even less than this. Thus we use as our P-value
P = .1256: this is the probability that we'd get a sample "this
bad, or even worse," just due to sampling variation. Thus there's a 13%
chance we'd get a sample with 5 or fewer Democrats in it if in fact the
percentage of Democrats is 40%. We now have to decide if this probability
is small enough for us to be able to justify rejecting the null hypothesis.
What's really the difference between a hypothesis test and a significance
test? In the hypothesis test, we decide on the criterion for rejecting
the null hypothesis before the sample is taken; if the results of
the sample don't meet our criteria, we don't reject the hypothesis. In
a significance test, we sample first, and look at the result, and decide
if this is unlikely enough for us to conclude that the null hypothesis
Which is better? Both are used, but statisticians prefer the approach
used in the hypothesis test, as a way to "keep themselves honest."
If we choose a significance level of .01 in designing a hypothesis test,
and then compute our rejection region based on this, we will then only
reject the null hypothesis if the sample we get is one of the "1% of worst
samples" that would occur if the null hypothesis is true. If we get a sample
that is close to our rejection region, but not in it, we don't reject!
In the significance test, we look at the probability that a sample would
be "as bad as or worse than" the one we got; if this probability is low,
we reject the null hypothesis. But since we haven't specified a clear-cut
criterion for rejection, we can sometimes "talk ourselves into" rejecting
the hypothesis. For example, the P-value for a sample might come out to
be .02; thus there's only a 2% chance we'd get a sample this bad if the
null hypothesis is true, and we might be inclined to reject the hypothesis.
However, this sample would not have fallen into the rejection region for
the hypothesis test with significance level .01, since its P-value is a
little larger than the significance level. Thus we would not have permitted
ourselves to reject, having previously decided on our criterion.
Hypothesis and Significance Tests on the Population Mean
Frequently, our hypothesis will concern the value of the mean of a population.
Consider the machine filling popcorn boxes discussed in an example
in section 7.4, in which the mean fill (in
ounces) was adjustable but the standard deviation arose from a built-in
tolerance and was known to be s = .3 ounces.
Suppose the machine is supposed to be set so that the mean fill is at least
14.0 ounces of popcorn per box, but we suspect that it has gone out of
adjustment and the mean fill is now less than 14 ounces.
Our hypotheses would be:
H0 : m = 14.0
H1 : m < 14.0
Suppose we want to test our hypothesis with a hypothesis test at significance
level a = .05 (so that there will only be a
5% chance we'll conclude that the machine is out of adjustment when it
is in fact OK).
To test, we'll take a sample of 50 boxes, and look at the sample mean;
if is sufficiently
less than 14.0, we'll conclude that the machine is out of adjustment and
that the mean fill is indeed less than 14.0 ounces.
Rejection region: how far below 14.0 should
be for us to reject H0?
Suppose we now take our sample of 50 popcorn boxes, and find that
= 13.88 ounces. Then by the above, since the value of
lies in our rejection region, we would reject the null hypothesis that
the machine is adjusted properly to give a mean fill of 14.0 ounces, and
accept instead the alternate hypothesis that the mean fill is set below
Assuming that the weights of boxes are normally distributed,
will also be normally distributed. (Since n is large (n = 50),
will be normally distributed even if the weights aren't normally distributed,
by the Central Limit Theorem.)
will have the standard normal distribution; using the null hypothesis,
that m = 14.0 ounces, and the known value of
s = .3 ounces, this becomes
We want the Z value such that only 5% of Z-values would be below this value
just due to random chance; this value is the critical value -z.05
Thus if the null hypothesis is true and m
= 14.0, only 5% of samples will have
Solving for , we
find that if H0 is true, for only 5% of samples will
just due to sampling variation. If we get a sample with
at or below this level, it's more likely that the null hypothesis isn't
true and that the machine is out of adjustment.
This gives us our rejection region; reject if our sample yields
To test the hypothesis via a significance test, we would not bother
to figure out a rejection region; we'd just take a sample, look at the
value of obtained,
and compute its P-value to see if it's low enough for us to reject the
To compute the P-value for the sample above, with
= 13.88 ounces, we need to find the probability that we'd get a sample
with a mean this low or lower just by chance if the null hypothesis is
in fact true. Thus we want to find P(
<= 13.88), assuming that the population mean is m
Using the fact that
is normally distributed, we use the Z distribution to compute this probability:
Thus there's only a 2% chance we'd get a sample with a mean this low
or lower if the null hypothesis were in fact true; since this is quite
unlikely, we would reject the null hypothesis.
Previous section Next