# The hypothesis-testing procedures

## The hypothesis-testing procedures

222

CHAPTER 7

✪ The t Test for a Single Sample 223

✪ The t Test for Dependent Means 236

✪ Assumptions of the t Test for a Single Sample and the t Test for Dependent Means 247

✪ Effect Size and Power for the t Test for Dependent Means 247

✪ Controversy: Advantages and Disadvantages of Repeated-Measures Designs 250

At this point, you may think you know all about hypothesis testing. Here’s asurprise: what you know will not help you much as a researcher. Why? Theprocedures for testing hypotheses described up to this point were, of course, absolutely necessary for what you will now learn. However, these procedures in- volved comparing a group of scores to a known population. In real research practice, you often compare two or more groups of scores to each other, without any direct information about populations. For example, you may have two scores for each per- son in a group of people, such as scores on an anxiety test before and after psy- chotherapy or number of familiar versus unfamiliar words recalled in a memory experiment. Or you might have one score per person for two groups of people, such

✪ Single Sample t Tests and Dependent Means t Tests in Research Articles 252

✪ Summary 253

✪ Key Terms 254

✪ ExampleWorked-OutProblems 254

✪ Practice Problems 258

✪ Using SPSS 265

✪ Chapter Notes 268

Introduction to t Tests Single Sample and Dependent Means

Chapter Outline

IS B

N 0-558-46761-X

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 223

t test hypothesis-testing procedure in which the population variance is un- known; it compares t scores from a sam- ple to a comparison distribution called a t distribution.

as an experimental group and a control group in a study of the effect of sleep loss on problem solving, or comparing the self-esteem test scores of a group of 10-year-old girls to a group of 10-year-old boys.

These kinds of research situations are among the most common in psychology, where usually the only information available is from samples. Nothing is known about the populations that the samples are supposed to come from. In particular, the researcher does not know the variance of the populations involved, which is a crucial ingredient in Step ❷ of the hypothesis-testing process (determining the characteristics of the comparison distribution).

In this chapter, we first look at the solution to the problem of not knowing the population variance by focusing on a special situation: comparing the mean of a sin- gle sample to a population with a known mean but an unknown variance. Then, after describing how to handle this problem of not knowing the population variance, we go on to consider the situation in which there is no known population at all—the sit- uation in which all we have are two scores for each of a number of people.

The hypothesis-testing procedures you learn in this chapter, those in which the population variance is unknown, are examples of t tests. The t test is sometimes called “Student’s t” because its main principles were originally developed by William S. Gosset, who published his research articles anonymously using the name “Student” (see Box 7–1).

The t Test for a Single Sample Let’s begin with an example. Suppose your college newspaper reports an informal survey showing that students at your college study an average of 17 hours per week. However, you think that the students in your dormitory study much more than that. You randomly pick 16 students from your dormitory and ask them how much they study each day. (We will assume that they are all honest and accurate.) Your result is that these 16 students study an average of 21 hours per week. Should you conclude that students in your dormitory study more than the college average? Or should you conclude that your results are close enough to the college average that the small dif- ference of 4 hours might well be due to your having picked, purely by chance, 16 of the more studious residents in your dormitory?

In this example you have scores for a sample of individuals and you want to com- pare the mean of this sample to a population for which you know the mean but not the variance. Hypothesis testing in this situation is called a t test for a single sample. (It is also called a one-sample t test.) The t test for a single sample works basically the same way as the Z test you learned in Chapter 5. In the studies we considered in that chapter, you had scores for a sample of individuals (such as a group of 64 students rating the at- tractiveness of a person in a photograph after being told that the person has positive personality qualities) and you wanted to compare the mean of this sample to a popula- tion (in this case, a population of students not told about the person’s personality qual- ities). However, in the studies we considered in Chapter 5, you knew both the mean and variance of the general population to which you were going to compare your sam- ple. In the situations we are now going to consider, everything is the same, but you don’t know the population variance. This presents two important new wrinkles affect- ing the details of how you carry out two of the steps of the hypothesis-testing process.

The first important new wrinkle is in Step ❷. Because the population variance is not known, you have to estimate it. So the first new wrinkle we consider is how to estimate an unknown population variance. The other important new wrinkle affects Steps ❷ and ❸. When the population variance has to be estimated, the shape of the comparison

t test for a single sample hypothesis- testing procedure in which a sample mean is being compared to a known population mean and the population variance is unknown.

IS B

N 0-

55 8-

46 76

1- X

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

distribution is not quite a normal curve; so the second new wrinkle we consider is the shape of the comparison distribution (for Step ❷) and how to use a special table to find the cutoff (Step ❸) on what is a slightly differently shaped distribution.

Let’s return to the amount of studying example. Step ❶ of the hypothesis-testing procedure is to restate the problem as hypotheses about populations. There are two populations:

Population 1: The kind of students who live in your dormitory. Population 2: The kind of students in general at your college.

The research hypothesis is that Population 1 students study more than Population 2 students; the null hypothesis is that Population 1 students do not study more than Population 2 students. So far, the problem is no different from those in Chapter 5.

Step ❷ is to determine the characteristics of the comparison distribution. In this example, its mean will be 17, what the survey found for students at your college generally (Population 2).

224 Chapter 7

professor of mathematics and not a proper brewer at all. To his statistical colleagues, mainly at the Biometric Lab- oratory at University College in London, he was a mere brewer and not a proper mathematician.

So Gosset discovered the t distribution and invented the t test—simplicity itself (compared to most of statistics)—for situations when samples are small and the variability of the larger population is unknown. How- ever, the Guinness brewery did not allow its scientists to publish papers, because one Guinness scientist had re- vealed brewery secrets. To this day, most statisticians call the t distribution “Student’s t” because Gosset wrote under the anonymous name “Student.” A few of his fel- low statisticians knew who “Student” was, but apparently meetings with others involved the secrecy worthy of a spy novel. The brewery learned of his scientific fame only at his death, when colleagues wanted to honor him.

In spite of his great achievements, Gosset often wrote in letters that his own work provided “only a rough idea of the thing” or so-and-so “really worked out the com- plete mathematics.” He was remembered as a thoughtful, kind, humble man, sensitive to others’ feelings. Gosset’s friendliness and generosity with his time and ideas also resulted in many students and younger colleagues mak- ing major breakthroughs based on his help.

To learn more about William Gosset, go to http:// www-history.mcs.st-andrews.ac.uk/Biographies/Gosset. html.

Sources: Peters (1987); Salsburg (2001); Stigler (1986); Tankard (1984).

BOX 7–1 William S. Gosset, Alias “Student”: Not a Mathematician, But a Practical Man

William S. Gosset graduated from Oxford University in 1899 with degrees in mathe- matics and chemistry. It hap- pened that in the same year the Guinness brewers in Dublin, Ireland, were seeking a few young scientists to take a first-ever scientific look at beer making. Gosset took one of these jobs and soon had

immersed himself in barley, hops, and vats of brew. The problem was how to make beer of a consistently

high quality. Scientists such as Gosset wanted to make the quality of beer less variable, and they were especially in- terested in finding the cause of bad batches. A proper sci- entist would say, “Conduct experiments!” But a business such as a brewery could not afford to waste money on ex- periments involving large numbers of vats, some of which any brewer worth his hops knew would fail. So Gosset was forced to contemplate the probability of, say, a certain strain of barley producing terrible beer when the experi- ment could consist of only a few batches of each strain. Adding to the problem was that he had no idea of the vari- ability of a given strain of barley—perhaps some fields planted with the same strain grew better barley. (Does this sound familiar? Poor Gosset, like today’s psychologists, had no idea of his population’s variance.)

Gosset was up to the task, although at the time only he knew that. To his colleagues at the brewery, he was a

The Granger Collection

IS B

N 0-558-46761-X

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The next part of Step ❷ is finding the variance of the distribution of means. Now you face a problem. Up to now in this book, you have always known the variance of the population of individuals. Using that variance, you then figured the variance of the distribution of means. However, in the present example, the variance of the number of hours studied for students at your college (the Population 2 students) was not reported in the newspaper article. So you email the paper. Unfortunately, the reporter did not figure the variance, and the original survey results are no longer available. What to do?

Basic Principle of the t Test: Estimating the Population Variance from the Sample Scores If you do not know the variance of the population of individuals, you can estimate it from what you do know—the scores of the people in your sample.

In the logic of hypothesis testing, the group of people you study is considered to be a random sample from a particular population. The variance of this sample ought to reflect the variance of that population. If the scores in the population have a lot of variation, then the scores in a sample randomly selected from that population should also have a lot of variation. If the population has very little variation, the scores in a sample from that population should also have very little variation. Thus, it should be possible to use the variation among the scores in the sample to make an informed guess about the spread of the scores in the population. That is, you could figure the variance of the sample’s scores, and that should be similar to the variance of the scores in the population. (See Figure 7–1.)

There is, however, one small hitch. The variance of a sample will generally be slightly smaller than the variance of the population from which it is taken. For this reason, the variance of the sample is a biased estimate of the population variance.1

It is a biased estimate because it consistently underestimates the actual variance of the population. (For example, if a population has a variance of 180, a typical sample

Introduction to t Tests 225

(c)(b) (a)

Figure 7–1 The variation in samples (as in each of the lower distributions) is similar to the variations in the populations they are taken from (each of the upper distributions).

biased estimate estimate of a popula- tion parameter that is likely systemati- cally to overestimate or underestimate the true value of the population parame- ter. For example, would be a biased estimate of the population variance (it would systematically underestimate it).

SD2

IS B

N 0-

55 8-

46 76

1- X

226 Chapter 7

unbiased estimate of the population variance ( ) estimate of the popula- tion variance, based on sample scores, which has been corrected so that it is equally likely to overestimate or under- estimate the true population variance; the correction used is dividing the sum of squared deviations by the sample size minus 1, instead of the usual procedure of dividing by the sample size directly.

S2 of 20 scores might have a variance of only 171.) If we used a biased estimate of the population variance in our research studies, our results would not be accurate. There- fore, we need to identify an unbiased estimate of the population variance.

Fortunately, you can figure an unbiased estimate of the population variance by slightly changing the ordinary variance formula. The ordinary variance formula is the sum of the squared deviation scores divided by the number of scores. The changed for- mula still starts with the sum of the squared deviation scores, but divides this by the number of scores minus 1. Dividing by a slightly smaller number makes the result slightly larger. Dividing by the number of scores minus 1 makes the variance you get just enough larger to make it an unbiased estimate of the population variance. (This unbiased estimate is our best estimate of the population variance. However, it is still an estimate, so it is unlikely to be exactly the same as the true population variance. But we can be certain that our unbiased estimate of the population variance is equally likely to be too high as it is to be too low. This is what makes the estimate unbiased.)

The symbol we will use for the unbiased estimate of the population variance is . The formula is the usual variance formula, but now dividing by :

(7–1)

(7–2)

Let’s return again to the example of hours spent studying and figure the estimated population variance from the sample’s 16 scores. First, you figure the sum of squared deviation scores. (Subtract the mean from each of the scores, square those deviation scores, and add them.) Presume in our example that this comes out to To get the estimated population variance, you divide this sum of squared deviation scores by the number of scores minus 1; that is, in this example, you divide 694 by

; 694 divided by 15 comes out to 46.27. In terms of the formula,

At this point, you have now seen several different types of standard deviation and variance (that is, for a sample, for a population, and unbiased estimates); and each of these types has used a different symbol. To help you keep them straight, a summary of the types of standard deviation and variance is shown in Table 7–1.

Degrees of Freedom The number you divide by (the number of scores minus 1) to get the estimated pop- ulation variance has a special name. It is called the degrees of freedom. It has this name because it is the number of scores in a sample that are “free to vary.” The idea is that, when figuring the variance, you first have to know the mean. If you know the mean and all but one of the scores in the sample, you can figure out the one you don’t know with a little arithmetic. Thus, once you know the mean, one of the scores in the sample is not free to have any possible value. So in this kind of situation the degrees of freedom are the number of scores minus 1. In terms of a formula,

(7–3)

df is the degrees of freedom.

df = N – 1

S2 = a (X – M)2 N – 1 =

SS

N – 1 = 694

16 – 1 =

694

15 = 46.27

16 – 1

694 (SS = 694).

S = 2S2

S2 = a (X – M)2 N – 1 =

SS

N – 1

N – 1S2 The estimated population variance is the sum of the squared deviation scores di- vided by the number of scores minus 1.

The estimated population standard deviation is the square root of the estimated population variance.

degrees of freedom (df ) number of scores free to vary when estimating a population parameter; usually part of a formula for making that estimate—for example, in the formula for estimating the population variance from a single sample, the degrees of freedom is the number of scores minus 1.

Table 7–1 Summary of Different Types of Standard Deviation and Variance

Statistical Term Symbol

Sample standard deviation SD

Population standard deviation

Estimated population S standard deviation

Sample variance SD2

Population variance

Estimated population variance S 2 �2

�

The degrees of freedom are the number of scores in the sample minus 1.

IS B

N 0-558-46761-X

Introduction to t Tests 227

In our example, . (In some situations you learn about in later chapters, the degrees of freedom are figured a bit differently. This is because in those situations, the number of scores free to vary is different. For all the situations you learn about in this chapter, .)

The formula for the estimated population variance is often written using df in- stead of :

(7–4)

The Standard Deviation of the Distribution of Means Once you have figured the estimated population variance, you can figure the stan- dard deviation of the comparison distribution using the same procedures you learned in Chapter 5. Just as before, when you have a sample of more than one, the compar- ison distribution is a distribution of means, and the variance of a distribution of means is the variance of the population of individuals divided by the sample size. You have just estimated the variance of the population. Thus, you can estimate the variance of the distribution of means by dividing the estimated population variance by the sample size. The standard deviation of the distribution of means is the square root of its variance. Stated as formulas,

(7–5)

(7–6)

Note that, with an estimated population variance, the symbols for the variance and standard deviation of the distribution of means use S instead of .

In our example, the sample size was 16 and we worked out the estimated popu- lation variance to be 46.27. The variance of the distribution of means, based on that estimate, will be 2.89. That is, 46.27 divided by 16 equals 2.89. The standard devia- tion is 1.70, the square root of 2.89. In terms of the formulas,

The Shape of the Comparison Distribution When Using an Estimated Population Variance: The t Distribution In Chapter 5 you learned that when the population distribution follows a normal curve, the shape of the distribution of means will also be a normal curve. However, this changes when you do hypothesis testing with an estimated population variance. When you are using an estimated population variance, you have less true informa- tion and more room for error. The mathematical effect is that there are likely to be slightly more extreme means than in an exact normal curve. Further, the smaller your

SM = 2S2M = 22.89 = 1.70 S2M =

S2

N =

46.27

16 = 2.89

�

SM = 2S2M

S2M = S2

N

S2 = a (X – M)2

df =

SS

df

N – 1

df = N – 1

df = 16 – 1 = 15

T I P F O R S U C C E S S Be sure that you fully understand the difference between and . These terms look quite similar, but they are quite different. is the estimated variance of the popula- tion of individuals. is the esti- mated variance of the distribution of means (based on the estimated variance of the population of indi- viduals, ).S2

S2M

S2

SM 2S2

The estimated population variance is the sum of squared deviations divided by the de- grees of freedom.

The variance of the distribu- tion of means based on an es- timated population variance is the estimated population variance divided by the num- ber of scores in the sample.

The standard deviation of the distribution of means based on an estimated population vari- ance is the square root of the variance of the distribution of means based on an estimated population variance.

IS B

N 0-

55 8-

46 76

1- X

228 Chapter 7

sample size, the bigger this tendency. This is because, with a smaller sample size, your estimate of the population variance is based on less information.

The result of all this is that, when doing hypothesis testing using an estimated variance, your comparison distribution will not be a normal curve. Instead, the com- parison distribution will be a slightly different curve called a t distribution.

Actually, there is a whole family of t distributions. They vary in shape according to the degrees of freedom you used to estimate the population variance. However, for any particular degrees of freedom, there is only one t distribution.

Generally, t distributions look to the eye like a normal curve—bell-shaped, sym- metrical, and unimodal. A t distribution differs subtly in having heavier tails (that is, slightly more scores at the extremes). Figure 7–2 shows the shape of a t distribution compared to a normal curve.

This slight difference in shape affects how extreme a score you need to reject the null hypothesis. As always, to reject the null hypothesis, your sample mean has to be in an extreme section of the comparison distribution of means, such as the top 5%. However, if the comparison distribution has more of its means in the tails than a normal curve would have, then the point where the top 5% begins has to be farther out on this comparison distribution. The result is that it takes a slightly more extreme sample mean to get a significant result when using a t distribution than when using a normal curve.

Just how much the t distribution differs from the normal curve depends on the de- grees of freedom, the amount of information used in estimating the population vari- ance. The t distribution differs most from the normal curve when the degrees of freedom are low (because your estimate of the population variance is based on a very small sample). For example, using the normal curve, you may recall that 1.64 is the cutoff for a one-tailed test at the .05 level. On a t distribution with 7 degrees of free- dom (that is, with a sample size of 8), the cutoff is 1.895 for a one-tailed test at the .05 level. If your estimate is based on a larger sample, say a sample of 25 (so that ), the cutoff is 1.711, a cutoff much closer to that for the normal curve. If your sample size is infinite, the t distribution is the same as the normal curve. (Of course, if your sample size were infinite, it would include the entire population!) But even with sam- ple sizes of 30 or more, the t distribution is nearly identical to the normal curve.

Shortly, you will learn how to find the cutoff using a t distribution, but let’s first return briefly to the example of how much students in your dorm study each week. You finally have everything you need for Step ❷ about the characteristics of the comparison distribution. We have already seen that the distribution of means in this example has a mean of 17 hours and a standard deviation of 1.70. You can now add that the shape of the comparison distribution will be a t distribution with 15 degrees of freedom.2

df = 24

Normal distribution

t distribution

Figure 7–2 A t distribution (dashed blue line) compared to the normal curve (solid black line).

t distribution mathematically defined curve that is the comparison distribution used in a t test.

IS B

N 0-558-46761-X

Introduction to t Tests 229

The Cutoff Sample Score for Rejecting the Null Hypothesis: Using the t Table Step ❸ of hypothesis testing is determining the cutoff for rejecting the null hypothesis. There is a different t distribution for any particular degrees of freedom. However, to avoid taking up pages and pages with tables for each possible t distribution, you use a simplified table that gives only the crucial cutoff points. We have included such a t table in the Appendix (Table A–2). Just as with the normal curve table, the t table shows only positive t scores. If you have a one-tailed test, you need to decide whether your cutoff score is a positive t score or a negative t score. If your one-tailed test is test- ing whether the mean of Population 1 is greater than the mean of Population 2, the cut- off t score is positive. However, if your one-tailed test is testing whether the mean of Population 1 is less than the mean of Population 2, the cutoff t score is negative.

In the hours-studied example, you have a one-tailed test. (You want to know whether students in your dorm study more than students in general at your college study.) You will probably want to use the 5% significance level, because the cost of a Type I error (mistakenly rejecting the null hypothesis) is not great. You have 16 partic- ipants, making 15 degrees of freedom for your estimate of the population variance.

Table 7–2 shows a portion of the t table from Table A–2 in the Appendix. Find the column for the .05 significance level for one-tailed tests and move down to the row for 15 degrees of freedom. The crucial cutoff is 1.753. In this example, you are testing whether students in your dormitory (Population 1) study more than students in general at your college (Population 2). In other words, you are testing whether

Table 7–2 Cutoff Scores for t Distributions with 1 Through 17 Degrees of Freedom (Highlighting Cutoff for Hours-Studied Example)

One-Tailed Tests Two-Tailed Tests

df .10 .05 .01 .10 .05 .01

1 3.078 6.314 31.821 6.314 12.706 63.657

2 1.886 2.920 6.965 2.920 4.303 9.925

3 1.638 2.353 4.541 2.353 3.182 5.841

4 1.533 2.132 3.747 2.132 2.776 4.604

5 1.476 2.015 3.365 2.015 2.571 4.032

6 1.440 1.943 3.143 1.943 2.447 3.708

7 1.415 1.895 2.998 1.895 2.365 3.500

8 1.397 1.860 2.897 1.860 2.306 3.356

9 1.383 1.833 2.822 1.833 2.262 3.250

10 1.372 1.813 2.764 1.813 2.228 3.170

11 1.364 1.796 2.718 1.796 2.201 3.106

12 1.356 1.783 2.681 1.783 2.179 3.055

13 1.350 1.771 2.651 1.771 2.161 3.013

14 1.345 1.762 2.625 1.762 2.145 2.977

15 1.341 1.753 2.603 1.753 2.132 2.947

16 1.337 1.746 2.584 1.746 2.120 2.921

17 1.334 1.740 2.567 1.740 2.110 2.898

t table table of cutoff scores on the t distribution for various degrees of freedom, significance levels, and one- and two-tailed tests.

IS B

N 0-

55 8-

46 76

1- X

230 Chapter 7

students in your dormitory have a higher t score than students in general. This means that the cutoff t score is positive. Thus, you will reject the null hypothesis if your sample’s mean is 1.753 or more standard deviations above the mean on the compar- ison distribution. (If you were using a known variance, you would have found your cutoff from a normal curve table. The Z score to reject the null hypothesis based on the normal curve would have been 1.645.)

One other point about using the t table: In the full t table in the Appendix, there are rows for each degree of freedom from 1 through 30, then for 35, 40, 45, and so on up to 100. Suppose your study has degrees of freedom between two of these higher values. To be safe, you should use the nearest degrees of freedom to yours given on the table that is less than yours. For example, in a study with 43 degrees of freedom, you would use the cutoff for .

The Sample Mean’s Score on the Comparison Distribution: The t Score Step ❹ of hypothesis testing is figuring your sample mean’s score on the comparison distribution. In Chapter 5, this meant finding the Z score on the comparison distribution—the number of standard deviations your sample’s mean is from the mean on the distribution. You do exactly the same thing when your comparison distri- bution is a t distribution. The only difference is that, instead of calling this a Z score, because it is from a t distribution, you call it a t score. In terms of a formula,

(7–7)

In the example, your sample’s mean of 21 is 4 hours from the mean of the distri- bution of means, which amounts to 2.35 standard deviations from the mean (4 hours divided by the standard deviation of 1.70 hours).3 That is, the t score in the example is 2.35. In terms of the formula,

Deciding Whether to Reject the Null Hypothesis Step ➎ of hypothesis testing is deciding whether to reject the null hypothesis. This step is exactly the same with a t test, as it was in the hypothesis-testing situations dis- cussed in previous chapters. In the example, the cutoff t score was 1.753 and the actual t score for your sample was 2.35. Conclusion: reject the null hypothesis. The research hypothesis is supported that students in your dorm study more than students in the college overall.

Figure 7–3 shows the various distributions for this example.

Summary of Hypothesis Testing When the Population Variance Is Not Known Table 7–3 compares the hypothesis-testing procedure we just considered (for a t test for a single sample) with the hypothesis-testing procedure for a Z test from Chapter 5. That is, we are comparing the current situation in which you know the population’s mean but not its variance to the Chapter 5 situation, where you knew the population’s mean and variance.

t = M – �

SM =

21 – 17 1.70

= 4

1.70 = 2.35

t = M – �

SM

df = 40

The t score is your sample’s mean minus the population mean, divided by the standard deviation of the distribution of means.

t score on a t distribution, number of standard deviations from the mean (like a Z score, but on a t distribution).

IS B

N 0-558-46761-X

Introduction to t Tests 231

Comparison distribution

(t)

Population (normal)

17

15.3013.60 17.00 18.70 20.40 –1–2 0 1 2

21

Sample

Raw Scores: t Scores:

Figure 7–3 Distribution for the hours-studied example.

Table 7–3 Hypothesis Testing with a Single Sample Mean When Population Variance Is Unknown (t Test for a Single Sample) Compared to When Population Variance Is Known (Z Test)

Another Example of a t Test for a Single Sample Consider another fictional example. Suppose a researcher was studying the psychologi- cal effects of a devastating flood in a small rural community. Specifically, the researcher was interested in how hopeful (versus unhopeful) people felt after the flood. The

Steps in Hypothesis Testing Difference From When Population Variance Is Known

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

No difference in method.

❷ Determine the characteristics of the comparison distribution:

Population mean No difference in method.

Standard deviation of the distribution of sample means

No difference in method (but based on estimated population variance).

Population variance Estimate from the sample.

Shape of the comparison distribution Use the t distribution with .df = N – 1 ❸ Determine the significance cutoff. Use the t table.

❹ Determine your sample’s score on the comparison distribution.

No difference in method (but called a t score).

❺ Decide whether to reject the null hypothesis. No difference in method.

IS B

N 0-

55 8-

46 76

1- X

232 Chapter 7

researcher randomly selected 10 people from this community to complete a short ques- tionnaire. The key item on the questionnaire asked how hopeful they felt, using a 7-point scale from extremely unhopeful (1) to neutral (4) to extremely hopeful (7). The re- searcher wanted to know whether the ratings of hopefulness for people who had been through the flood would be consistently above or below the neutral point on the scale (4).

Table 7–4 shows the results and figuring for the t test for a single sample; Figure 7–4 shows the distributions involved. Here are the steps of hypothesis testing.

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: People who experienced the flood. Population 2: People who are neither hopeful nor unhopeful.

The research hypothesis is that the two populations will score differently. The null hypothesis is that they will score the same.

❷ Determine the characteristics of the comparison distribution. If the null hy- pothesis is true, the mean of both populations is 4. The variance of these popu- lations is not known, so you have to estimate it from the sample. As shown in Table 7–4, the sum of the squared deviations of the sample’s scores from the sample’s mean is 32.10. Thus, the estimated population variance is 32.10 divided by 9 degrees of freedom (10 – 1), which comes out to 3.57.

The distribution of means has a mean of 4 (the same as the population mean). Its variance is the estimated population variance divided by the sample size (3.57

Table 7–4 Results and Figuring for a Single-Sample t Test for a Study of 10 People’s Ratings of Hopefulness Following a Devastating Flood (Fictional Data)

Rating (X )

Difference From the Mean

(X � M )

Squared Difference From the Mean

(X � M )2

5 .30 .09

3 2.89

6 1.30 1.69

2 7.29

7 2.30 5.29

6 1.30 1.69

7 5.29

4 .49

2 7.29

5 .30 .09

47 32.10

t with needed for 1% significance level, two-tailed .

Actual sample .

Decision: Do not reject the null hypothesis.

t = (M – �)>SM = (4.70 – 4.00)>.60 = .70>.60 = 1.17 = ;3.250df = 9

SM = 2S 2M = 2.36 = .60. S2M = S 2>N = 3.57>10 = .36 S 2 = SS>df = 32.10>(10 – 1) = 32.10>9 = 3.57. � = 4.00. df = N – 1 = 10 – 1 = 9. M = (©X )>N = 47>10 = 4.70.

©:

-2.70 – .70

-2.30

-2.70

-1.70

T I P F O R S U C C E S S Be careful. To find the variance of a distribution of means, you always divide the population variance by the sample size. This is true whether the population’s variance is known or only estimated. It is only when making the estimate of the population variance that you divide by the sample size minus 1. That is, the degrees of freedom are used only when estimating the variance of the population of individuals.

IS B

N 0-558-46761-X

Introduction to t Tests 233

divided by 10 equals .36). The square root of this, the standard deviation of the distribution of means, is .60. Its shape will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. The researcher wanted to be very cau- tious about mistakenly concluding that the flood made a difference. Thus, she decided to use the .01 significance level. The hypothesis was nondirectional (that is, no specific direction of difference from the mean of 4 was specified; either result would have been of interest); so the researcher used a two-tailed test. The researcher looked up the cutoff in Table 7–2 (or Table A–2 in the Appendix) for a two-tailed test and 9 degrees of freedom. The cutoff given in the table is 3.250. Thus, to reject the null hypothesis, the sample’s score on the comparison distribution must be 3.250 or higher, or or lower.

❹ Determine your sample’s score on the comparison distribution. The sam- ple’s mean of 4.70 is .70 scale points from the null hypothesis mean of 4.00. That makes it 1.17 standard deviations on the comparison distribution from that distribution’s mean ; .

➎ Decide whether to reject the null hypothesis. The t of 1.17 is not as extreme as the needed t of . Therefore, the researcher cannot reject the null hy- pothesis. The study is inconclusive. (If the researcher had used a larger sample, giving more power, the result might have been quite different.)

Summary of Steps for a t Test for a Single Sample Table 7–5 summarizes the steps of hypothesis testing when you have scores from a single sample and a population with a known mean but an unknown variance.4

;3.250

t = 1.17(.70>.60 = 1.17)

-3.250

df = 9

Comparison distribution (t)

Population (normal)

4.00

3.40 4.00 4.60 –1 0 1

4.70

Sample

Raw Scores: t Scores:

Figure 7–4 Distributions for the example of how hopeful individuals felt following a devastating flood.

IS B

N 0-

55 8-

46 76

1- X

234 Chapter 7

Table 7–5 Steps for a t Test for a Single Sample

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. The mean is the same as the known population mean.

b. The standard deviation is figured as follows:

●A Figure the estimated population variance: .

●B Figure the variance of the distribution of means:

●C Figure the standard deviation of the distribution of means: .

c. The shape will be a t distribution with degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected.

a. Decide the significance level and whether to use a one-tailed or a two-tailed test.

b. Look up the appropriate cutoff in a t table.

❹ Determine your sample’s score on the comparison distribution: .

❺ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M – �)>SM

N – 1 SM = 2S 2M

S 2M = S 2>N. S 2 = SS>df

How are you doing?

1. In what sense is a sample’s variance a biased estimate of the variance of the population the sample is taken from? That is, in what way does the sample’s variance typically differ from the population’s?

2. What is the difference between the usual formula for figuring the variance and the formula for estimating a population’s variance from the scores in a sample (that is, the formula for an unbiased estimate of the population variance)?

3. (a) What are degrees of freedom? (b) How do you figure the degrees of freedom in a t test for a single sample? (c) What do they have to do with estimating the population variance? (d) What do they have to do with the t distribution?

4. (a) How does a t distribution differ from a normal curve? (b) How do degrees of freedom affect this? (c) What is the effect of the difference on hypothesis testing?

5. List three differences in how you do hypothesis testing for a t test for a single sample versus for the Z test (you learned in Chapter 5).

6. A population has a mean of 23. A sample of 4 is given an experimental proce- dure and has scores of 20, 22, 22, and 20. Test the hypothesis that the proce- dure produces a lower score. Use the .05 significance level. (a) Use the steps of hypothesis testing and (b) make a sketch of the distributions involved.

❸Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. From Table A–2, the cutoff for a one-tailed ttest at the .05 level for is . The cutoff tscore is negative, since the research hypothesis is that the procedure produces a lowerscore.

❹Determine your sample’s score on the comparison distribution. .

➎Decide whether to reject the null hypothesis.The tof is more ex- treme than the needed tof . Therefore, reject the null hypothesis; the research hypothesis is supported.

(b) Sketches of distributions are shown in Figure 7–5.

-2.353 -3.51

t=(M-�)>SM=(21-23)>.57=-2>.57=-3.51

-2.353 df=3

IS B

N 0-558-46761-X

Introduction to t Tests 235

6.(a) Steps of hypothesis testing: ❶Restate the question as a research hypothesis and a null hypothesis

about the populations. There are two populations:

Population 1: People who are given the experimental procedure. Population 2: The general population.

The research hypothesis is that Population 1 will score lower than Population 2. The null hypothesis is that Population 1 will not score lower than Population 2. ❷Determine the characteristics of the comparison distribution.

a.The mean of the distribution of means is 23. b.The standard deviation is figured as follows:

●AFiguretheestimatedpopulationvariance.Youfirstneedtofigure thesamplemean,whichis.The estimatedpopulationvarianceis

. ●BFigure the variance of the distribution of means:

●CFigure the standard deviation of the distribution of means:

c.Theshapeofthecomparisondistributionwillbeatdistributionwithdf=3. SM=2S2M=2.33=.57

S2 M=S2 > N=1.33 > 4=.33

– 12)>3=(1+1+1+1)>3=4>3=1.33 12 + (-12+12+ (22-21)2+(20-21)24>(4-1)= (22-21)2+

S2=SS>(N-1)=[(20-21)2+ (20+22+22+20)>4=84>4=21

Comparison distribution

(t)

Population (normal)

23

21.8622.4323

21

Sample

21.29

–2–10 –3

Raw Scores: t Scores:

Figure 7–5Distributions for answer to “How Are You Doing?” question 6b.

IS B

N 0-

55 8-

46 76

1- X

236 Chapter 7

repeated-measures design research strategy in which each person is tested more than once; same as within subjects design.

t test for dependent means hypothesis-testing procedure in which there are two scores for each person and the population variance is not known; it determines the significance of a hypoth- esis that is being tested using difference or change scores from a single group of people.

The t Test for Dependent Means The situation you just learned about (the t test for a single sample) is for when you know the population mean but not its variance and you have a single sample of scores. It turns out that in most research you do not even know the population’s mean; plus, in most research situations you usually have not one set, but two sets, of scores. These two things, not knowing the population mean and having two sets of scores, is very, very common.

The rest of this chapter focuses specifically on this important research situation in which you have two scores from each person in your sample. This kind of research sit- uation is called a repeated-measures design (also known as a within subjects design). A common example is when you measure the same people before and after some psychological or social intervention. For example, a psychologist might measure the quality of men’s communication before and after receiving premarital counseling.

The hypothesis-testing procedure for the situation in which each person is mea- sured twice (that is, for the situation in which we have a repeated-measures design) is a t test for dependent means. It has the name “dependent means” because the mean for each group of scores (for example, a group of before-scores and a group of after-scores) are dependent on each other in that they are both from the same people. (In Chapter 8, we consider the situation in which you compare scores from two different groups of people, a research situation you analyze using a t test for independent means.)

You do a t test for dependent means exactly the same way as a t test for a single sample, except that (a) you use something called difference scores, and (b) you as- sume that the population mean (of the difference scores) is 0. We will now consider each of these two new aspects.

Difference Scores With a repeated-measures design, your sample includes two scores for each person in- stead of just one. The way you handle this is to make the two scores per person into one

Answers

1.The sample’s variance will in general be smaller than the variance of the pop- ulation the sample is taken from.

2.Intheusualformulayoudividebythenumberofparticipants(N);intheformu- la for estimating a population’s variance from the scores in a sample, you divide by the number of participants in the sample minus 1 (that is, ).

3.(a) Degrees of freedom consist of the number of scores free to vary. (b) The de- grees of freedom in a ttest for a single sample consist of the number of scores in the sample minus 1. (c) In estimating the population variance, the formula is the sum of squared deviations divided by the degrees of freedom. (d) tdistrib- utions differ slightly from each other according to the degrees of freedom.

4.(a) A tdistribution differs from a normal curve in that it has heavier tails; that is, more scores at the extremes. (b) The more degrees of freedom, the closer the shape (including the tails) is to a normal curve. (c) The cutoffs for significance are more extreme for a tdistribution than for a normal curve.

5.In the ttest you (a) estimate the population variance from the sample (it is not known in advance); (b) you look up the cutoff on a ttable in which you also have to take into account the degrees of freedom (you don’t use a normal curve table); and (c) your sample’s score on the comparison distribution, which is a tdistribution (not a normal curve), is a tscore (not a Zscore).

N-1

IS B

N 0-558-46761-X

Introduction to t Tests 237

score per person! You do this magic by creating difference scores: For each person, you subtract one score from the other. If the difference is before versus after, differ- ence scores are also called change scores.

Consider the example of the quality of men’s communication before and after re- ceiving premarital counseling. The psychologist subtracts the communication quality score before the counseling from the communication quality score after the counsel- ing. This gives an after-minus-before difference score for each man. When the two scores are a before-score and an after-score, we usually take the after-score minus the before-score to indicate the change.

Once you have the difference score for each person in the study, you do the rest of the hypothesis testing with difference scores. That is, you treat the study as if there were a single sample of scores (scores that in this situation happen to be difference scores).

Population of Difference Scores with a Mean of 0 So far in the research situations we have considered in this book, you have always known the mean of the population to which you compared your sample’s mean. For example, in the college dormitory survey of hours studied, you knew the population mean was 17 hours. However, now we are using difference scores, and we usually don’t know the mean of the population of difference scores.

Here is the solution. Ordinarily, the null hypothesis in a repeated-measures de- sign is that on the average there is no difference between the two groups of scores. For example, the null hypothesis in a study of the quality of men’s communication before and after receiving premarital counseling is that on the average there is no dif- ference between communication quality before and after the counseling. What does no difference mean? Saying there is on the average no difference is the same as say- ing that the mean of the population of the difference scores is 0. Therefore, when working with difference scores, you are comparing the population of difference scores that your sample of difference scores comes from to a population of differ- ence scores with a mean of 0. In other words, with a t test for dependent means, what we call Population 2 will ordinarily have a mean of 0 (that is, it is a population of dif- ference scores that has a mean of 0).

Example of a t Test for Dependent Means Olthoff (1989) tested the communication quality of couples three months before and again three months after marriage. One group studied was 19 couples who had re- ceived ordinary (very minimal) premarital counseling from the ministers who were going to marry them. (To keep the example simple, we will focus on just this one group and only on the husbands in the group. Scores for wives were similar, though somewhat more varied, making it a more complicated example for learning the t test procedure.)

The scores for the 19 husbands are listed in the “Before” and “After” columns in Table 7–6, followed by all the t test figuring. (The distributions involved are shown in Figure 7–6.) The crucial column for starting the analysis is the difference scores. For example, the first husband, whose communication quality was 126 before mar- riage and 115 after had a difference of . (We figured after minus before, so that an increase is positive and a decrease, as for this husband, is negative.) The mean of the difference scores is . That is, on the average, these 19 husbands’ commu- nication quality decreased by about 12 points.

Is this decrease significant? In other words, how likely is it that this sample of difference scores is a random sample from a population of difference scores whose mean is 0?

-12.05

-11

difference scores difference between a person’s score on one testing and the same person’s score on another testing; often an after-score minus a before- score, in which case it is also called a change score.

IS B

N 0-

55 8-

46 76

1- X

238 Chapter 7

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: Husbands who receive ordinary premarital counseling. Population 2: Husbands whose communication quality does not change from before to after marriage. (In other words, it is a population of husbands whose mean difference in communication quality from before to after marriage is 0.)

The research hypothesis is that Population 1’s mean difference score (com- munication quality after marriage minus communication quality before marriage) is different from Population 2’s mean difference score (of zero). That is, the

Table 7–6 t Test for Communication Quality Scores Before and After Marriage for 19 Husbands Who Received Ordinary Premarital Counseling

Husband Communication

Quality Difference

(After – Before) Deviation

(Difference – M ) Squared Deviation

Before After

A 126 115 1.05 1.10

B 133 125 4.05 16.40

C 126 96 322.20

D 115 115 0 12.05 145.20

E 108 119 11 23.05 531.30

F 109 82 223.50

G 124 93 359.10

H 98 109 11 23.05 531.30

I 95 72 119.90

J 120 104 15.60

K 118 107 1.05 1.10

L 126 118 4.05 16.40

M 121 102 48.30

N 116 115 11.05 122.10

O 94 83 1.05 1.10

P 105 87 35.40

Q 123 121 10.05 101.00

R 125 100 167.70

S 128 118 2.05 4.20

2,210 1,981 2,762.90

For difference scores:

(assumed as a no-change baseline of comparison).

t with needed for 5% level, two-tailed .

Decision: Reject the null hypothesis.

Source: Data from Olthoff (1989).

t = (M – �)>SM = (-12.05 – 0)>2.84 = -4.24. = ;2.101df = 18

SM = 2S2M = 28.08 = 2.84. S2M = S2>N = 153.49>19 = 8.08. S2 = SS>df = 2,762.90>(19 – 1) = 153.49. � = 0 M = -229>19 = -12.05.

-229©: -10

-12.95-25 -2

-5.95-18 -11 -1

-6.95-19 -8

-11 -3.95-16

-10.95-23

-18.95-31 -14.95-27

-17.95-30 -8

-11

T I P F O R S U C C E S S As in previous chapters, Popula- tion 2 is the population for when the null hypothesis is true.

IS B

N 0-558-46761-X

Introduction to t Tests 239

research hypothesis is that husbands who receive ordinary premarital counseling, like the husbands Olthoff studied, do change in communication quality from be- fore to after marriage. The null hypothesis is that the populations are the same— that the husbands who receive ordinary premarital counseling do not change in their communication quality from before to after marriage.

Notice that you have no actual information about Population 2 husbands. The husbands in the study are a sample of Population 1 husbands. For the pur- poses of hypothesis testing, you set up Population 2 as a kind of straw man com- parison group. That is, for the purpose of the analysis, you set up a comparison group of husbands who, if measured before and after marriage, would on the average show no difference.

❷ Determine the characteristics of the comparison distribution. If the null hy- pothesis is true, the mean of the population of difference scores is 0. The vari- ance of the population of difference scores can be estimated from the sample of difference scores. As shown in Table 7–6, the sum of squared deviations of the difference scores from the mean of the difference scores is 2,762.90. With 19 husbands in the study, there are 18 degrees of freedom. Dividing the sum of squared deviation scores by the degrees of freedom gives an estimated popula- tion variance of difference scores of 153.49.

The distribution of means (from this population of difference scores) has a mean of 0, the same as the mean of the population of difference scores. The vari- ance of the distribution of means of difference scores is the estimated population variance of difference scores (153.49) divided by the sample size (19), which

Comparison distribution (t)

Population of difference

scores

0

–2.85 0 2.85 –1 0 1

–12.05

Sample

Raw Scores t Scores

Figure 7–6 Distributions for the Olthoff (1989) example of a t test for dependent means.

IS B

N 0-

55 8-

46 76

1- X

240 Chapter 7

gives 8.08. The standard deviation of the distribution of means of difference scores is 2.84, the square root of 8.08. Because Olthoff was using an estimated population variance, the comparison distribution is a t distribution. The estimate of the population variance of difference scores is based on 18 degrees of freedom; so this comparison distribution is a t distribution for 18 degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. Olthoff used a two-tailed test to allow for either an increase or decrease in communication quality. Using the .05 sig- nificance level and 18 degrees of freedom, Table A–2 shows cutoff t scores of

and . ❹ Determine your sample’s score on the comparison distribution. Olthoff’s

sample had a mean difference score of . That is, the mean was 12.05 points below the mean of 0 on the distribution of means of difference scores. The standard deviation of the distribution of means of difference scores is 2.84. Thus, the mean of the difference scores of is 4.24 standard deviations below the mean of the distribution of means of difference scores. So Olthoff’s sample of difference scores has a t score of .

❺ Decide whether to reject the null hypothesis. The t of for the sample of difference scores is more extreme than the needed t of . Thus, you can re- ject the null hypothesis: Olthoff’s husbands are from a population in which hus- bands’ communication quality is different after marriage from what it was before (it is lower).

Olthoff’s actual study was more complex. You may be interested to know that he found that the wives also showed this decrease in communication quality after mar- riage. But a group of similar engaged couples who were given special communication skills training by their ministers (much more than the usual short session) had no sig- nificant decline in marital communication quality after marriage. In fact, there is a great deal of research showing that on the average marital happiness declines steeply over time (VanLaningham et al., 2001). And many studies have now shown the value of a full course of premarital communications training. For example, a recent repre- sentative survey of 3,344 adults in the United States showed that those who had at- tended a premarital communication program had significantly greater marital satisfaction, had less marital conflict, and were 31% less likely to divorce (Stanley et al., 2006). Further, benefits were greatest for those with a college education!

Summary of Steps for a t Test for Dependent Means Table 7–7 summarizes the steps for a t test for dependent means.5

A Second Example of a t Test for Dependent Means Here is another example. A team of researchers examined the brain systems involved in human romantic love (Aron et al., 2005). One issue was whether romantic love en- gages a part of the brain called the caudate (a brain structure that is engaged when peo- ple win money, are given cocaine, and other such “rewards”). Thus, the researchers recruited people who had very recently fallen “madly in love.” (For example, to be in the study participants had to think about their partner at least 80% of their waking hours.) Participants brought a picture of their beloved with them, plus a picture of a fa- miliar, neutral person of the same age and sex as their beloved. Participants then went in to the functional magnetic resonance imaging (fMRI) machine and their brain was scanned while they looked at the two pictures—30 seconds at the neutral person’s pic- ture, 30 seconds at their beloved, 30 seconds at the neutral person, and so forth.

;2.101 -4.24

-4.24

-12.05

-12.05

-2.101+2.101

T I P F O R S U C C E S S Step ❷ of hypothesis testing for the t test for dependent means is more complex than previously. This can make it easy to lose track of the purpose of this step. Step ❷ of hypothesis testing determines the characteristics of the comparison distribution. In the case of the t test for dependent means, this compar- ison distribution is a distribution of means of difference scores. The key characteristics of this distribu- tion are its mean (which is as- sumed to equal 0), its standard deviation (which is estimated as ), and its shape (a t distribution with degrees of freedom equal to the sample size minus 1).

SM

T I P F O R S U C C E S S You now have to deal with some rather complex terms, such as the standard deviation of the distribu- tion of means of difference scores. Although these terms are complex, there is good logic behind them. The best way to understand such terms is to break them down into manageable pieces. For example, you will notice that these new terms are the same as the terms for the t test for a single sample, with the added phrase “of differ- ence scores.” This phrase has been added because all of the fig- uring for the t test for dependent means uses difference scores.

IS B

N 0-558-46761-X

Introduction to t Tests 241

Table 7–8 shows average brain activations (mean fMRI scanner values) in the caudate area of interest during the two kinds of pictures. (We have simplified the example for teaching purposes, including using only 10 participants when the actual study had 17.) It also shows the figuring of the difference scores and all the other

Table 7–7 Steps for a t Test for Dependent Means

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. Make each person’s two scores into a difference score. Do all the remaining steps using these difference scores.

b. Figure the mean of the difference scores.

c. Assume a mean of the distribution of means of difference scores of 0: .

d. The standard deviation of the distribution of means of difference scores is figured as follows:

●A Figure the estimated population variance of difference scores: .

●B Figure the variance of the distribution of means of difference scores: .

●C Figure the standard deviation of the distribution of means of difference scores:

e. The shape is a t distribution with .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected.

a. Decide the significance level and whether to use a one-tailed or a two-tailed test.

b. Look up the appropriate cutoff in a t table.

❹ Determine your sample’s score on the comparison distribution: .

➎ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M – �)>SM

df = N – 1 SM = 2S 2M .

S2M = S 2>N S 2 = SS>df

� = 0

Table 7–8 t Test for a Study of Romantic Love and Brain Activation in Part of the Caudate

Brain Activation

Student Beloved’s photo Control photo

Difference (Beloved –

Control) Deviation

(Difference – M ) Squared Deviation

1 1487.8 1487.2 .6 .640

2 1329.4 1328.1 1.3 .010

3 1407.9 1405.9 2.0 .600 .360

4 1236.1 1234.0 2.1 .700 .490

5 1299.8 1298.2 1.6 .200 .040

6 1447.2 1444.7 2.5 1.100 1.210

7 1354.1 1354.3 2.560

8 1204.6 1203.7 .9 .250

9 1322.3 1320.8 1.5 .100 .010

10 1388.5 1386.8 1.7 .300 .090

13477.7 13463.7 14.0 5.660

For difference scores:

(assumed as a no-change baseline of comparison).

t with needed for 5% level, one-tailed .

Decision: Reject the null hypothesis.

Source: Data based on Aron et al. (2005).

t = (M – �)>SM = (1.400 – 0)>.251 = 5.58. = 1.833df = 9

SM = 2S2M = 2.063 = .251. S2M = S2>N = .629>10 = .063. S 2 = SS>df = 5.660>(10 – 1) = 5.660>9 = .629. � = 0 M = 14.0>10 = 1.400.

©:

– .500 -1.600- .2

– .100 – .800

IS B

N 0-

55 8-

46 76

1- X

242 Chapter 7

figuring for the t test for dependent means. Figure 7–7 shows the distributions in- volved. Here are the steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: Individuals like those tested in this study. Population 2: Individuals whose brain activation in the caudate area of interest is the same when looking at a picture of their beloved and a picture of a familiar, neutral person.

The research hypothesis is that Population 1’s mean difference score (brain activa- tion when viewing the beloved’s picture minus brain activation when viewing the neutral person’s picture) is greater than Population 2’s mean difference score (of no difference). That is, the research hypothesis is that brain activation in the caudate area of interest is greater when viewing the beloved person’s picture than when viewing the neutral person’s picture. The null hypothesis is that Population 1’s mean difference score is not greater than Population 2’s. That is, the null hypothe- sis is that brain activation in the caudate area of interest is not greater when viewing the beloved person’s picture than when viewing the neutral person’s picture.

❷ Determine the characteristics of the comparison distribution. a. Make each person’s two scores into a difference score. This is shown in the

column labeled “Difference” in Table 7–8. You do all the remaining steps using these difference scores.

Comparison distribution (t)

Population of difference

scores

0

–.251 0 .251 –1 0 1

1.400

Sample

Raw Scores: t Scores:

Figure 7–7 Distributions for the example of romantic love and brain activation in part of the caudate.

IS B

N 0-558-46761-X

Introduction to t Tests 243

b. Figure the mean of the difference scores. The sum of the difference scores (14.0) divided by the number of difference scores (10) gives a mean of the difference scores of 1.400. So, .

c. Assume a mean of the distribution of means of difference scores of 0: d. The standard deviation of the distribution of means of difference scores is fig-

ured as follows: ●A Figure the estimated population variance of difference scores:

. ●B Figure the variance of the distribution of means of difference scores:

●C Figure the standard deviation of the distribution of means of difference scores:

e. The shape is a t distribution with . Therefore, the comparison distribution is a t distribution for 9 degrees of freedom. It is a t distribution because we figured its variance based on an estimated population variance. It has 9 degrees of freedom because there were 9 degrees of freedom in the estimate of the population variance.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. a. We will use the standard .05 significance level. This is a one-tailed test because

the researchers were interested only in a specific direction of difference. b. Using the .05 significance level with 9 degrees of freedom, Table A–2 shows a

cutoff t of 1.833. In Table 7–8, the difference score is figured as brain activa- tion when viewing the beloved’s picture minus brain activation when viewing the neutral person’s picture. Thus, the research hypothesis predicts a positive difference score, which means that our cutoff is .

❹ Determine your sample’s score on the comparison distribution. . The sample’s mean difference

of 1.400 is 5.58 standard deviations (of .251 each) above the mean of 0 on the distribution of means of difference scores.

➎ Decide whether to reject the null hypothesis. The sample’s t score of 5.58 is more extreme than the cutoff t of 1.833. You can reject the null hypothesis. Brain activation in the caudate area of interest is greater when viewing a beloved’s picture than when viewing a neutral person’s picture. The results of this study are not limited to North Americans. Recently, the study was replicated, with virtually identical results, in Beijing with Chinese students who were in- tensely in love (Xu et al., 2007).

t Test for Dependent Means with Scores from Pairs of Research Participants The t test for dependent means is also called a paired-samples t test, t test for correlated means, t test for matched samples, and t test for matched pairs. Each of these names comes from the same idea that in this kind of t test you are comparing two sets of scores that are related to each other in a direct way. In the t test for dependent means examples in this chapter, the two sets of scores have been related because each individual had a score in both sets of scores (for example, a score before a procedure and a score after a procedure). However, you can also use a t test for dependent means with scores from pairs of research participants, considering each pair as if it were one person, and figur- ing the difference score for each pair. For example, suppose you have 30 married cou- ples and want to test whether wives consistently do more housework than husbands.

t = (M – �)>SM = (1.400 – 0)>.251 = 5.58

+1.833

df = N – 1 SM = 2S2M = 2.063 = .251.

S2M = S2 > N = .629 > 10 = .063.

S2 = SS>df = 5.660>(10 – 1) = .629

� = 0. M = 1.400

IS B

N 0-

55 8-

46 76

1- X

244 Chapter 7

You could figure for each couple a difference score of the wife’s hours of housework per week minus her husband’s number of hours of housework per week. There are also situations in which experimenters create pairs. For example, a researcher might put participants into pairs to do a puzzle task together and, for each pair, assign one to be a leader and one a follower. At the end of the study, participants privately fill out a ques- tionnaire about how much they enjoyed the interaction. The procedure for analyzing this study would be to create a difference score for each pair by taking the enjoyment rating of the leader minus the enjoyment rating of the follower.

Review and Comparison of Z Test, t Test for a Single Sample, and t test for Dependent Means In Chapter 5 you learned about the Z test; in this chapter you have learned about the t test for a single sample and the t test for dependent means. Table 7–9 provides a review and comparison of the Z test, the t test for a single sample, and the t test for dependent means.

T I P F O R S U C C E S S We recommend that you spend some time carefully going through Table 7–9. Test your understanding of the different tests by covering up portions of the table and trying to recall the hidden information. Also, take a look at Chapter Note 3 (page 268) for a discussion of the terminology used in the formulas.

Table 7–9 Review of the Z Test, the t Test for a Single Sample, and the t Test for Dependent Means

Type of Test

Features Z Test t Test for a

Single Sample t Test for

Dependent Means

Population variance is known Yes No No

Population mean is known Yes Yes No

Number of scores for each participant 1 1 2

Shape of comparison distribution Z distribution t distribution t distribution

Formula for degrees of freedom Not applicable

Formula t = (M – �)>SMt = (M – �)>SMZ = (M – �M)>�M

df = N – 1df = N – 1

How are you doing?

1. Describe the situation in which you would use a t test for dependent means. 2. When doing a t test for dependent means, what do you do with the two

scores you have for each participant? 3. In a t test for dependent means, (a) what is usually considered to be the mean

of the “known” population (Population 2). (b) Why? 4. Five individuals are tested before and after an experimental procedure; their

scores are given in the following table. Test the hypothesis that there is no change, using the .05 significance level. (a) Use the steps of hypothesis test- ing and (b) sketch the distributions involved.

Person Before After

1 20 30 2 30 50 3 20 10 4 40 30 5 30 40

IS B

N 0-558-46761-X

Introduction to t Tests 245

5. What about the research situation makes the difference in whether you should carry out a Z test or a t test for a single sample?

6. What about the research situation makes the difference in whether you should carry out a t test for a single sample or a t test for dependent means?

Comparison distribution (t)

Population of difference

scores

0

–606 –101

4.0

Sample

Raw Scores: t Scores:

Figure 7–8Distributions for answer to “How Are You Doing?” question 4.

➎Decidewhethertorejectthenullhypothesis.Thesample’stscoreof.67 isnotmoreextremethanthecutofftof.Therefore,donotrejectthe nullhypothesis.

4.(b) The distributions are shown inFigure 7–8. 5.As shown in Table 7–9, whether the population variance is known determines

whether you should carry out a Ztest or a ttest for a single sample. You use a Ztest when the population variance is known and you use the ttest for a single sample when it is not known.

6.As shown in Table 7–9, whether the population mean is known and whether there are one or two scores for each participant determines whether you should carry out a ttest for a single sample or a ttest for dependent means. You use a ttest for a single sample when you know the population mean and you have one score for each participant; you use the ttest for dependent means when you do not know the population mean and there are two scores for each participant.

;2.776

IS B

N 0-

55 8-

46 76

1- X

246 Chapter 7

Answers

1.A ttest for dependent means is used when you are doing hypothesis testing and you have two scores for each participant (such as a before-score and an after-score) and the population variance is unknown. It is also used when a study compares participants who are organized into pairs.

2.Subtract one from the other to create a difference (or change) score for each person. The ttest is then done with these difference (or change) scores.

3.(a) The mean of the “known” population (Population 2) is 0. (b) You are com- paring your sample to a situation in which there is no difference—a population of difference scores in which the average difference is 0.

4.(a) Steps of hypothesis testing (all figuring is shown in Table 7–10): ❶Restate the question as a research hypothesis and a null hypothesis

about the populations.There are two populations:

Population 1:People like those tested before and after the experimental procedure. Population 2:People whose scores are the same before and after the experimental procedure.

The research hypothesis is that Population 1’s mean change score (after minus before) is different from Population 2’s. The null hypothesis is that Population 1’s mean change score is the same as Population 2’s.

❷Determine the characteristics of the comparison distribution.The mean of the distribution of means of difference scores (the comparison distribution) is 0; the standard deviation of the distribution of means of dif- ference scores is 6; it is a tdistribution with 4 degrees of freedom.

❸Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected.For a two-tailed test at the .05 level, the cutoff sample scores are and .

❹Determine your sample’s score on the comparison distribution. . t=(4-0)>6=.67

-2.776 +2.776

Table 7–10Figuring for Answer to “How Are You Doing?”Question 4

ScoreDifferenceDeviation

PersonBeforeAfter(After– Before)(Difference– M) Squared Deviation

1203010636

230502016256

32010196

44030196

5304010636

14016020720

For difference scores:

tfor needed for 5% significance level,two-tailed.

Decision:Do not reject the null hypothesis.

t=(M-�)>SM=(4-0)>6=.67 =;2.776 df=4

SM=2S2M=236=6. S2

M=S2>N=180>5=36. S2=SS>df=720>(5-1)=720>4=180. �=0. M=20>5=4.00.

©:

-14 -10 -14 -10

IS B

N 0-558-46761-X

Introduction to t Tests 247

Assumptions of the t Test for a Single Sample and the t Test for Dependent Means As we have seen, when you are using an estimated population variance, the comparison distribution is a t distribution. However, the comparison distribution will be exactly a t distribution only if the distribution of individuals follows a normal curve. Otherwise, the comparison distribution will follow some other (usually unknown) shape.

Thus, strictly speaking, a normal population is a requirement within the logic and mathematics of the t test. A requirement like this for a hypothesis-testing procedure is called an assumption. That is, a normal population distribution is one assumption of the t test. The effect of this assumption is that if the population distri- bution is not normal, the comparison distribution will be some indeterminate shape other than a t distribution—and thus the cutoffs on the t table will be incorrect.

Unfortunately, when you do a t test, you don’t know whether the population is nor- mal. This is because, when doing a t test, usually all you have to go on are the scores in your sample. Fortunately, however, as we saw in Chapter 3, distributions in psychology research quite often approximate a normal curve. (This also applies to distributions of difference scores.)Also, statisticians have found that, in practice, you get reasonably ac- curate results with t tests even when the population is rather far from normal. In other words, the t test is said to be robust over moderate violations of the assumption of a nor- mal population distribution. How statisticians figure out the robustness of a test is an interesting topic, which is described in Box 8–1 in Chapter 8.

The only very common situation in which using a t test for dependent means is likely to give a seriously distorted result is when you are using a one-tailed test and the population is highly skewed (is very asymmetrical, with a much longer tail on one side than the other). Thus, you need to be cautious about your conclusions when doing a one-tailed test if the sample of difference scores is highly skewed, suggest- ing the population it comes from is also highly skewed.

Effect Size and Power for the t Test for Dependent Means Effect Size You can figure the effect size for a study using a t test for dependent means the same way as in Chapter 6.6 t is the difference between the population means divided by the population standard deviation: . When using this formula for a t test for dependent means, is for the predicted mean of the population of differ- ence scores, (the “known” population mean) is almost always 0, and usually stands for the standard deviation of the population of difference scores. The conven- tions for effect size for a t test for dependent means are also the same as you learned for the situation we considered in Chapter 6: A small effect size is .20, a medium ef- fect size is .50, and a large effect size is .80.

Consider an example. A sports psychologist plans a study on attitudes toward teammates before versus after a game. She will administer an attitude questionnaire twice, once before and once after a game. Suppose that the smallest before-after dif- ference that would be of any importance is 4 points on the questionnaire. Also sup- pose that, based on related research, the researcher figures that the standard deviation of difference scores on this attitude questionnaire is about 8 points. Thus, and

. Applying the effect size formula, . In terms of the effect size conventions, her planned study has a medium effect size.

d = (�1 – �2)>� = (4 – 0)>8 = .50� = 8 �1 = 4

��2

�1

d = (�1 – �2)>�

assumption condition, such as a pop- ulation’s having a normal distribution, required for carrying out a particular hypothesis-testing procedure; a part of the mathematical foundation for the accuracy of the tables used in determin- ing cutoff values.

robustness extent to which a particu- lar hypothesis-testing procedure is rea- sonably accurate even when its assumptions are violated.

IS B

N 0-

55 8-

46 76

1- X

248 Chapter 7

To estimate the effect size after a study, use the actual mean of your sample’s difference scores as your estimate of , and use S (for the population of difference scores) as your estimate of .

Consider our first example of a t test for dependent means, the study of husbands’ change in communication quality. In that study, the mean of the differ- ence scores was . The estimated population standard deviation of the differ- ence scores would be 12.41. That is, we figured the estimated variance of the difference scores to be 153.49; Therefore, the estimated effect size is . This is a very large effect size. (The negative sign for the effect size means that the large effect was a decrease.)

Power Power for a t test for dependent means can be determined using a power table, a power software package, or an Internet power calculator. Table 7–11 gives the approximate power at the .05 significance level for small, medium, and large effect sizes and one- tailed and two-tailed tests. In the sports psychology example, the researcher expected a medium effect size ( ). If she planned to conduct the study using the .05 level, two-tailed, with 20 participants, the study would have a power of .59. This means that, if the research hypothesis is true and has a medium effect size, there is a 59% chance that this study will come out significant.

The power table (Table 7–11) is also useful when you are reading about a non- significant result in a published study. Suppose that a study using a t test for dependent means has a nonsignificant result. The study tested significance at the .05 level, was two-tailed, and had 10 participants. Should you conclude that there is in fact no differ- ence at all in the populations? Probably not. Even assuming a medium effect size, Table 7–11 shows that there is only a 32% chance of getting a significant result in this study.

d = .50

d = (�1 – �2)>� = (M – 0)>S = (-12.05 – 0)>12.39 = – .97 2S2 = 12.39.(S2)

-12.05

� �1

T I P F O R S U C C E S S Recall from Chapter 6 that power can be expressed as a probability (such as .71) or as a percentage (such as 71%). Power is expressed as a probability in Table 7–11 (as well as in power tables in later chapters).

Table 7–11 Approximate Power for Studies Using the t Test for Dependent Means for Testing Hypotheses at the .05 Significance Level

Effect SizeDifference Scores in Sample (N )

Small (d � .20)

Medium (d � .50)

Large (d � .80)

One-tailed test

10 .15 .46 .78

20 .22 .71 .96

30 .29 .86 *

40 .35 .93 *

50 .40 .97 *

100 .63 * *

Two-tailed test

10 .09 .32 .66

20 .14 .59 .93

30 .19 .77 .99

40 .24 .88 *

50 .29 .94 *

100 .55 * *

*Power is nearly 1.

IS B

N 0-558-46761-X

Introduction to t Tests 249

Consider another study that was not significant. This study also used the .05 sig- nificance level, two-tailed. This study had 100 research participants. Table 7–11 tells you that there would be a 55% chance of the study’s coming out significant if there were even a true small effect size in the population. If there were a medium effect size in the population, the table indicates that there is almost a 100% chance that this study would have come out significant. Thus, in this study with 100 participants, we could conclude from the results that in the population there is probably at most a small difference.

To keep Table 7–11 simple, we have given power figures for only a few differ- ent numbers of participants (10, 20, 30, 40, 50, and 100). This should be adequate for the kinds of rough evaluations you need to make when evaluating results of research articles.7

Planning Sample Size Table 7–12 gives the approximate number of participants needed for 80% power for a planned study. (Eighty percent is a common figure used by researchers for the minimum power to make a study worth doing.) Suppose you plan a study in which you expect a large effect size and you use the .05 significance level, two-tailed. The table shows you would only need 14 participants to have 80% power. On the other hand, a study using the same significance level, also two-tailed, but in which you ex- pect only a small effect size would need 196 participants for 80% power.8

How are you doing?

1. (a) What is an assumption in hypothesis testing? (b) Describe a specific as- sumption for a t test for dependent means. (c) What is the effect of violating this assumption? (d) What does it mean to say that the t test for dependent means is robust? (e) Describe a situation in which it is not robust.

2. How can you tell if you have violated the normal curve assumption? 3. (a) Write the formula for effect size; (b) describe each of its terms as they

apply to a planned t test for dependent means; (c) describe what you use for each of its terms in figuring effect size for a completed study that used a t test for dependent means.

4. You are planning a study in which you predict the mean of the population of difference scores to be 40, and the population standard deviation is 80. You plan to test significance using a t test for dependent means, one-tailed, with an alpha of .05. (a) What is the predicted effect size? (b) What is the power of this study if you carry it out with 20 participants? (c) How many participants would you need to have 80% power?

Table 7–12 Approximate Number of Research Participants Needed for 80% Power for the t Test for Dependent Means in Testing Hypotheses at the .05 Significance Level

Effect Size

Small (d � .20)

Medium (d � .50)

Large (d � .80)

One-tailed 156 26 12

Two-tailed 196 33 14

IS B

N 0-

55 8-

46 76

1- X

250 Chapter 7

Controversy: Advantages and Disadvantages of Repeated-Measures Designs The main controversies about t tests have to do with their relative advantages and disadvantages compared to various alternatives (alternatives we will discuss in Chapter 14). There is, however, one consideration that we want to comment on now. It is about all research designs in which the same participants are tested before and after some experimental intervention (the kind of situation the t test for dependent means is often used for).

Studies using difference scores (that is, studies using a repeated-measures de- sign) often have much larger effect sizes for the same amount of expected difference between means than other kinds of research designs. That is, testing each of a group of participants twice (once under one condition and once under a different condition) usually produces a study with high power. In particular, this kind of study gives more power than dividing the participants up into two groups and testing each group once (one group tested under one condition and the other tested under another condition). In fact, studies using difference scores usually have even more power than those in which you have twice as many participants, but each is tested only once.

Why do repeated-measures designs have so much power? The reason is that the standard deviation of difference scores is usually quite low. (The standard deviation of difference scores is what you divide by to get the effect size when using difference scores.) This produces a large effect size, which increases the power. In a repeated- measures design, the only variation is in the difference scores. Variation among par- ticipants on each testing’s scores is not part of the variation involved in the analysis. As an example, look back at Table 7–8 from our romantic love and brain imaging study. Notice that there were very great differences between the scores (fMRI scanner

Answers

1.(a) An assumption is a requirement that you must meet for the results of the hypothesis testing procedure to be accurate.(b) The population of individu- als’ difference scores is assumed to be a normal distribution. (c)The signifi- cance level cutoff from the ttable is not accurate. (d) Unless you very strongly violate the assumption (that is, unless the population distribution is very far from normal), the cutoff is fairly accurate.(e) The ttest for dependent means is not robust when you are doing a one-tailed test and the population distrib- ution is highly skewed.

2.You look at the distribution of the sample of difference scores to see if it is dramatically different from a normal curve.

3.(a) .(b) dis the effect size; is for the predicted mean of the population of difference scores; is the mean of the known population, which for a population of difference scores is almost always 0; is for the standard deviation of the population of difference scores.(c) To estimate , you use M,the actual mean of your sample’s difference scores; remains as 0; and for , you use S,the estimated standard deviation of the population of difference scores.

4.(a)Predicted effect size: . (b) Power of this study: .71. (c) Number of participants for 80% power: 26.