Create a practice data file that contains the following variables and values
Chapter 1 An Overview of IBM® SPSS® Statistics
Introduction: An Overview of IBM SPSS Statistics 23
THIS BOOK gives you the step-by-step instructions necessary to do most major types of data analysis using SPSS. The software was originally created by three Stanford graduate students in the late 1960s. The acronym “SPSS” initially stood for “Statistical Package for the Social Sciences.” As SPSS expanded their package to address the hard sciences and business markets, the name changed to “Statistical Product and Service Solutions.” In 2009 IBM purchased SPSS and the name morphed to “IBM SPSS Statistics.” SPSS is now such a standard in the industry that IBM has retained the name due to its recognizability. No one particularly cares what the letters “SPSS” stand for any longer. IBM SPSS Statistics is simply one of the world’s largest and most successful statistical software companies. In this book we refer to the program as SPSS.
For this book to be effective when you conduct data analysis with SPSS, you should have certain limited knowledge of statistics and have access to a computer that has the necessary resources to run SPSS. Each issue is addressed in the next two paragraphs.
STATISTICS You should have had at least a basic course in statistics or be in the process of taking such a course. While it is true that this book devotes the first two or three pages of each chapter to a description of the statistical procedure that follows, these descriptions are designed to refresh the reader’s memory, not to instruct the novice. While it is certainly possible for the novice to follow the steps in each chapter and get SPSS to produce pages of output, a fundamental grounding in statistics is important for an understanding of which procedures to use and what all the output means. In addition, while the first 16 chapters should be understandable by individuals with limited statistical background, the final 12 chapters deal with much more complex and involved types of analyses. These chapters require substantial grounding in the statistical techniques involved.
COMPUTER REQUIREMENTS You must:
· Have access to a personal computer that has
· Microsoft® Windows Vista® or Windows® 7 or 8.1 or 10; MAC OS® 10.8 (Mountain Lion) or higher installed
· IBM SPSS Statistics 23.0 installed
· Know how to turn the computer on
· Have a working knowledge of the keys on the keyboard and how to use a mouse—or other selection device such as key board strokes or touch screen monitors.
This book will take you the rest of the way. If you are using SPSS on a network of computers (rather than your own PC or MAC) the steps necessary to access IBM SPSS Statistics may vary slightly from the single step shown in the pages that follow.
IBM SPSS Statistics is a complex and powerful statistical program by any standards. The software occupies about 800 MB of your hard drive and requires at least 1 GB of RAM to operate adequately. Despite its size and complexity, SPSS has created a program that is not only powerful but is user friendly (you’re the user; the program tries to be friendly). By improvements over the years, SPSS has done for data analysis what Henry Ford did for the automobile: made it available to the masses. SPSS is able to perform essentially any type of statistical analysis ever used in the social sciences, in the business world, and in other scientific disciplines.
This book was written for Version 23 of IBM SPSS Statistics. More specifically, the screen shots and output are based on Version 23.0. With some exceptions, what you see here will be similar to SPSS Version 7.0 and higher. Because only a few parts of SPSS are changed with each version, most of this book will apply to previous versions. It’s 100% up-to-date with Version 23.0, but it will lead you astray only about 2% of the time if you’re using Version 21.0 or 22 and is perhaps 60% accurate for Version 7.0 (if you can find a computer and software that old).
Our book covers the statistical procedures present in three of the modules created by SPSS that are most frequently used by researchers. A module (within the SPSS context) is simply a set of different statistical operations. We include the Base Module (technically called IBM SPSS Statistics Base), the module covering advanced statistics (IBM SPSS Advanced Statistics), and the module that addresses regression models (IBM SPSS Regression)—all described in greater detail later in this chapter. To support their program, SPSS has created a set of comprehensive manuals that cover all procedures these three modules are designed to perform. To a person fluent in statistics and data analysis, the manuals are well written and intelligently organized. To anyone less fluent, however, the organization is often undetectable, and the comprehensiveness (the equivalent of almost 2,000 pages of fine-print text) is overwhelming. To the best of our knowledge, hard-copy manuals are no longer available but most of this information may now be accessed from SPSS as PDF downloads. The same information is also available in the exhaustive online Help menu. Despite changes in the method of accessing this information, for sake of simplicity we still refer to this body of information as “SPSS manuals” or simply “manuals.” Our book is about 400 pages long. Clearly we cannot cover in 400 pages as much material as the manuals do in 2,000, but herein lies our advantage.
The purpose of this book is to make the fundamentals of most types of data analysis clear. To create this clarity requires the omission of much (often unnecessary) detail. Despite brevity, we have been keenly selective in what we have included and believe that the material presented here is sufficient to provide simple instructions that cover 95% of analyses ever conducted by researchers. Although we cannot substantiate that exact number, our time in the manuals suggests that at least 1,600 of the 2,000 pages involve detail that few researchers ever consider. How often do you really need 7 different methods of extracting and 6 methods of rotating factors in factor analysis, or 18 different methods for post-hoc comparisons after a one-way ANOVA? (By the way, that last sentence should be understood by statistical geeks only.)
We are in no way critical of the manuals; they do well what they are designed to do and we regard them as important adjuncts to the present book. When our space limitations prevent explanation of certain details, we often refer our readers to the SPSS manuals. Within the context of presenting a statistical procedure, we often show a window that includes several options but describe only one or two of them. This is done without apology except for the occasional “description of these options extends beyond the scope of this book” and cheerfully refer you to the appropriate SPSS manual. The ultimate goal of this format is to create clarity without sacrificing necessary detail.
This chapter introduces the major concepts discussed in this book and gives a brief overview of the book’s organization and the basic tools that are needed in order to use it.
If you want to run a particular statistical procedure, have used IBM SPSS Statistics before, and already know which analysis you wish to conduct, you should read the Typographical and Formatting Conventions section in this chapter (pages 5–7) and then go to the appropriate chapter in the last portion of the book (Chapters 6 through 28). Those chapters will tell you exactly what steps you need to perform to produce the output you desire.
If, however, you are new to IBM SPSS Statistics, then this chapter will give you important background information that will be useful whenever you use this book.
1.4 This Book’s Organization, Chapter by Chapter
This book was created to describe the crucial concepts of analyzing data. There are three basic tasks associated with data analysis:
A. You must type data into the computer, and organize and format the data so both SPSS and you can identify it easily,
B. You must tell SPSS what type of analysis you wish to conduct, and
C. You must be able to interpret what the SPSS output means.
After this introductory chapter, Chapter 2 deals with basic operations such as types of SPSS windows, the use of the toolbar and menus, saving, viewing, and editing the output, printing output, and so forth. While this chapter has been created with the beginner in mind, there is much SPSS-specific information that should be useful to anyone. Chapter 3 addresses the first step mentioned above—creating, editing, and formatting a data file. The SPSS data editor is an instrument that makes the building, organizing, and formatting of data files wonderfully clear and straightforward.
Chapters 4 and 5 deal with two important issues—modification and transformation of data (Chapter 4) and creation of graphs or charts (Chapter 5). Chapter 4 deals specifically with different types of data manipulation, such as creating new variables, reordering, restructuring, merging files, or selecting subsets of data for analysis. Chapter 5 introduces the basic procedures used when making a number of different graphs; some graphs, however, are described more fully in the later chapters.
Chapters 6 through 28 then address Steps B and C—analyzing your data and interpreting the output. It is important to note that each of the analysis chapters is self-contained. If the beginner, for example, were instructed to conduct t tests on certain data, Chapter 11 would give complete instructions for accomplishing that procedure. In the Step by Step section, Step 1 is always “start the SPSS program” and refers the reader to Chapter 2 if there are questions about how to do this. The second step is always “create a data file or edit (if necessary) an already existing file,” and the reader is then referred to Chapter 3 for instructions if needed. Then the steps that follow explain exactly how to conduct a t test.
As mentioned previously, this book covers three modules produced by SPSS: IBM SPSS Statistics Base, IBM SPSS Advanced Statistics, and IBM SPSS Regression. Since some computers at colleges or universities may not have all of these modules (the Base module is always present), the book is organized according to the structure SPSS has imposed: We cover almost all procedures included in the Base module and then selected procedures from the more complex Advanced and Regression Modules. Chapters 6–22 deal with processes included in the Base module. Chapters 23–27 deal with procedures in the Advanced Statistics and Regression Modules, and Chapter 28, the analysis of residuals, draws from all three.
IBM SPSS STATISTICS BASE, Chapters 6 through 10 describe the most fundamental data analysis methods available, including frequencies, bar charts, histograms, and percentiles (Chapter 6); descriptive statistics such as means, medians, modes, skewness, and ranges (Chapter 7); crosstabulations and chi-square tests of independence (Chapter 8); subpopulation means (Chapter 9); and correlations between variables (Chapter 10).
The next group of chapters (Chapters 11 through 17) explains ways of testing for differences between subgroups within your data or showing the strength of relationships between a dependent variable and one or more independent variables through the use of t tests (Chapter 11); ANOVAs (Chapters 12, 13, and 14); linear, curvilinear, and multiple regression analysis (Chapters 15 and 16); and the most common forms of nonparametric tests are discussed in Chapter 17.
Reliability analysis (Chapter 18) is a standard measure used in research that involves multiple response measures; multidimensional scaling is designed to identify and model the structure and dimensions of a set of stimuli from dissimilarity data (Chapter 19); and then factor analysis (Chapter 20), cluster analysis (Chapter 21), and discriminant analysis (Chapter 22) all occupy stable and important niches in research conducted by scientists.
IBM SPSS ADVANCED STATISTICS AND REGRESSION: The next series of chapters deals with analyses that involve multiple dependent variables (SPSS calls these procedures General Linear Models; they are also commonly called MANOVAs or MANCOVAs). Included under the heading General Linear Model are simple and general factorial models and multivariate models (Chapter 23), and models with repeated measures or within-subjects factors (Chapter 24).
The next three chapters deal with procedures that are only infrequently performed, but they are described here because when these procedures are needed they are indispensable. Chapter 25describes logistic regression analysis and Chapters 26 and 27 describe hierarchical and nonhierarchical log-linear models, respectively. As mentioned previously, Chapter 28 on residuals closes out the book.
1.5 An Introduction to the Example
A single data file is used in 17 of the first 19 chapters of this book. For more complex procedures it has been necessary to select different data files to reflect the particular procedures that are presented. Example data files are useful because often, things that appear to be confusing in the SPSS documentation become quite clear when you see an example of how they are done. Although only the most frequently used sample data file is described here, there are a total of 12 data sets that are used to demonstrate procedures throughout the book, in addition to data sets utilized in the exercises. Data files are available for download at www.spss-step-by-step.net . These files can be of substantial benefit to you as you practice some of the processes presented here without the added burden of having to input the data. We suggest that you make generous use of these files by trying different procedures and then comparing your results with those included in the output sections of different chapters.
The example has been designed so it can be used to demonstrate most of the statistical procedures presented here. It consists of a single data file used by a teacher who teaches three sections of a class with approximately 35 students in each section. For each student, the following information is recorded:
· ID number
· Name
· Gender
· Ethnicity
· Year in school
· Upper- or lower-division class person
· Previous GPA
· Section
· Whether or not he or she attended review sessions or did the extra credit
· The scores on five 10-point quizzes and one 75-point final exam
In Chapter 4 we describe how to create four new variables. In all presentations that follow (and on the data file available on the website), these four variables are also included:
· The total number of points earned
· The final percent
· The final grade attained
· Whether the student passed or failed the course
The example data file (the entire data set is displayed at the end of Chapter 3) will also be used as the example in the introductory chapters (Chapters 2 through 5). If you enter the data yourself and follow the procedures described in these chapters, you will have a working example data file identical to that used through the first half of this book. Yes, the same material is recorded on the downloadable data files, but it may be useful for you to practice data entry, formatting, and certain data manipulations with this data set. If you have your own set of data to work with, all the better.
One final note: All of the data in the grades file are totally fictional, so any findings exist only because we created them when we made the file.
1.6 Typographical and Formatting Conventions
CHAPTER ORGANIZATION Chapters 2 through 5 describe IBM SPSS Statistics formatting and procedures, and the material covered dictates each chapter’s organization. Chapters 6 through 28(the analysis chapters) are, with only occasional exceptions, organized identically. This format includes:
1. The Introduction in which the procedure that follows is described briefly and concisely. These introductions vary in length from one to seven pages depending on the complexity of the analysis being described.
2. The Step by Step section in which the actual steps necessary to accomplish particular analyses are presented. Most of the typographical and formatting conventions described in the following pages refer to the Step by Step sections.
3. The Output section, in which the results from analyses described earlier are displayed—often abbreviated. Text clarifies the meaning of the output, and all of the critical output terms are defined.
THE SCREENS Due to the very visual nature of SPSS, every chapter contains pictures of screens or windows that appear on the computer monitor as you work. The first picture from Chapter 6(below) provides an example. These pictures are labeled “Screens” despite the fact that sometimes what is pictured is a screen (everything that appears on the monitor at a given time) and other times is a portion of a screen (a window, a dialog box, or something smaller). If the reader sees reference to Screen 13.3, she knows that this is simply the third picture in Chapter 13. The screens are typically positioned within breaks in the text (the screen icon and a title are included) and are used for sake of reference as procedures involving that screen are described. Sometimes the screens are separate from the text and labels identify certain characteristics of the screen (see the inside front cover for an example). Because screens take up a lot of space, frequently used screens are included on the inside front and back covers of this book. At other times, within a particular chapter, a screen from a different chapter may be cited to save space.
Screen 1.1 The Frequencies Window
Sometimes a portion of a screen or window is displayed (such as the menu bar included here) and is embedded within the text without a label.
The Step by Step boxes: Text that surrounds the screens may designate a procedure, but it is the Step by Step boxes that identify exactly what must be done to execute a procedure. The following box illustrates:
Sequence Step 3 means: “Beginning with Screen 1 (displayed on the inside front cover), click on the word File, move the cursor to Open, and then click the word Data. At this point a new window will open (Screen 2 on the inside front cover); type ‘grades.sav’ and then click the Open button, at which point a screen with your data file opens.” Notice that within brackets shortcuts are sometimes suggested: Rather than the File → Open → Data sequence, it is quicker to click the icon. Instead of typing grades.sav and then clicking Open, it is quicker to double click on the grades.sav (with or without the “.sav” suffix; this depends on your settings) file name. Items within Step by Step boxes include:
Screens: A small screen icon will be placed to the left of each group of instructions that are based on that screen. There are three different types of screen icons:
Other images with special meaning inside of Step by Step boxes include:
Sometimes fonts can convey information, as well:
Font | What it Means |
Monospaced font (Courier) | Any text within the boxes that is rendered in the Courier font represents text (numbers, letters, words) to be typed into the computer (rather than being clicked or selected). |
Italicized text | Italicized text is used for information or clarifications within the Step by Step boxes. |
Bold font | The bold font is used for words that appear on the computer screen. |
The groundwork is now laid. We wish you a pleasant journey through the exciting and challenging world of data analysis!
THIS CHAPTER deals with frequencies, graphical representation of frequencies (bar charts and pie charts), histograms, and percentiles. Each of these procedures is described below. Frequencies is one of the SPSS commands in which it is possible to access certain graphs directly (specifically, bar charts, pie charts, and histograms) rather than accessing them through the Graphs command. More information about editing these graphs is treated in some detail in Chapter 5. Bar charts or pie charts are typically used to show the number of cases (“frequencies”) in different categories. As such they clearly belong in a chapter on frequencies. Inclusion of histograms and percentiles seems a bit odd because they are most often used with a continuous distribution of values and are rarely used with categorical data. They are included here because the Frequencies command in SPSS is configured in such a way that, in addition to frequency information, you can also access histograms for continuous variables, certain descriptive information, and percentiles. The Descriptives command and descriptive statistics are described in Chapter 7; however, that procedure does not allow access to histograms or percentiles.
Frequencies is one of the simplest yet one of the most useful of all SPSS procedures. The Frequencies command simply sums the number of instances within a particular category: There were 56 males and 37 females. There were 16 Whites, 7 Blacks, 14 Hispanics, 19 Asians, and 5 others. There were 13 A’s, 29 B’s, 37 C’s, 7 D’s, and 3 F’s. Using the Frequencies command, SPSS will list the following information: value labels, the value code (the number associated with each level of a variable, e.g., female = 1, male = 2), the frequency, the percent of total for each value, the valid percent (percent after missing values are excluded), and the cumulative percent. These are each illustrated and described in the Output section.
The Bar chart(s) option is used to create a visual display of frequency information. A bar chart should be used only for categorical (not continuous) data. The gender, ethnicity, and grade variables listed in the previous paragraph represent categorical data. Each of these variables divides the data set into distinct categories such as male, female; A, B, C, D, F; and others. These variables can be appropriately displayed in a bar chart. Continuous data contain a series of numbers or values such as scores on the final exam, total points, finishing times in a road race, weight in pounds of individuals in your class, and so forth. Continuous variables are typically represented graphically with histograms, our next topic.
For continuous data, the Histograms option will create the appropriate visual display. A histogram is used to indicate frequencies of a range of values. A histogram is used when the number of instances of a variable is too large to want to list all of them. A good example is the breakdown of the final point totals in a class of students. Since it would be too cumbersome to list all scores on a graph, it is more practical to list the number of subjects within a range of values, such as how many students scored between 60 and 69 points, between 70 and 79 points, and so forth.
The Percentile Values option will compute any desired percentiles for continuous data. Percentiles are used to indicate what percent of a distribution lies below (and above) a particular value. For instance if a score of 111 was at the 75th percentile, this would mean that 75% of values are lower than 111 and 25% of values are higher than 111. Percentiles are used extensively in educational and psychological measurement.
The file we use to illustrate frequencies, bar charts, histograms, and percentiles (pie charts are so intuitive we do not present them here) is the example described in the first chapter. The file is called grades.sav and has an N = 105. This analysis computes frequencies, bar charts, histograms, and percentiles utilizing the gender, ethnic, grade, and total variables.
To access the initial SPSS screen from the Windows display, perform the following sequence of steps:
Mac users: To access the initial SPSS screen, successively click the following icons:
After clicking the SPSS program icon, Screen 1 appears on the monitor.
Step 2 | |
Create and name a data file or edit (if necessary) an already existing file (see Chapter 3 ) |
Screens 1 and 2 (displayed on the inside front cover) allow you to access the data file used in conducting the analysis of interest. The following sequence accesses the grades.sav file for further analyses:
Whether first entering SPSS or returning from earlier operations the standard menu of commands across the top is required. As long as it is visible you may perform any analyses. It is not necessary for the data window to be visible.
After completion of Step 3 a screen with the desired menu bar appears. When you click a command (from the menu bar), a series of options will appear (usually) below the selected command. With each new set of options, click the desired item. The sequence to access frequencies begins at any screen with the menu of commands visible:
6.5.1 Frequencies
A screen now appears (below) that allows you to select variables for which you wish to compute frequencies. The procedure involves clicking the desired variable name in the box to the left and then pasting it into the Variables(s) (or “active”) box to the right by clicking the right arrow () in the middle of the screen. If the desired variable is not visible, use the scroll bar arrows ( ) to bring it to view. To deselect a variable (to move it from the Variable(s) box back to the original list), click on the variable in the active box and the in the center will become a . Click on the left arrow to move the variable back. To clear all variables from the Variable(s) box, click the Reset button.
Screen 6.1 The Frequencies window
The following sequence of steps will allow you to compute frequencies for the variables ethnic, gender , and grade.
You have now selected the three variables associated with gender, ethnicity, and grades. By clicking the OK button, SPSS proceeds to compute frequencies. After a few moments the output will be displayed on the screen. The Output screen will appear every time an analysis is conducted (labeled Screen 6.2), and appears on the following page.
The results are now located in a window with the title Output# [Document#] – IBM SPSS Statistics Viewer at the top. To view the results, make use of the up and down arrows on the scroll bar ( ). Partial results from the procedure described above are found in the Output section. More complete information about output screens, editing output, and pivot charts are included in Chapter 2 (pages 17–22, 34–39 for Mac users). If you wish to conduct further analyses with the same data set, the starting point is again Screen 6.1. Perform whichever of Steps 1–4 (usually Step 4 is all that is necessary) are needed to arrive at this screen.
Screen 6.2 SPSS Output Viewer
6.5.2 Bar Charts
To create bar charts of categorical data, the process is identical to sequence Step 5 (in previous page), except that instead of clicking the final OK, you will click the Charts option (see Screen 6.1). At this point a new screen (Screen 6.3, below) appears: Bar charts, Pie charts, and Histograms are the types of charts offered. For categorical data you will usually choose Bar charts. You may choose Frequencies (the number of instances within each category) or Percentages (the percent of total for each category). After you click Continue, the Charts box disappears leaving Screen 6.1. A click on OK completes the procedure.
Screen 6.3 The Frequencies: Charts window
Screen 6.1 is the starting point for this procedure. Notice that we demonstrated a double click of the variable name to paste it into the active box (rather than a click on the button).
After a few moments of processing time (several hours if you are working on a typical university network) the output screen will emerge. A total of three bar charts have been created, one describing the ethnic breakdown, another describing the gender breakdown, and a third dealing with grades. To see these three graphs simply requires scrolling down the output page until you arrive at the desired graph. If you wish to edit the graphs for enhanced clarity, double click on the graph and then turn to Chapter 5 to assist you with a number of editing options. The chart that follows (Screen 6.4) shows the bar chart for ethnicity.
Screen 6.4 A Sample Bar Chart
6.5.3 Histograms
Histograms may be accessed in the same way as bar charts. The distinction between bar charts and histograms is that histograms are typically used for display of continuous (not categorical) data. For the variables used above (gender, ethnicity, and grades), histograms would not be appropriate. We will here make use of a histogram to display the distribution for the total points earned by students in the class. Perform the following sequence of steps to create a histogram for total. Refer, if necessary, to Screens 6.1, 6.2, and 6.3 on previous pages for visual reference. The histogram for this procedure is displayed in the Output section.
This procedure begins at Screen 6.1 . Perform whichever of Steps 1–4 (pages 102–103) are necessary to arrive at this screen. You may also need to click the Reset button before beginning.
Note the step where you click Display frequency tables to deselect that option. For categorical data, you will always keep this option since it constitutes the entire non-graphical output. For continuous data (the total variable in this case), a display of frequencies would be a list about 70 items, indicating that 1 subject scored 45, 2 subjects scored 47, and so on up to the number of subjects who scored 125. This is rarely desired. If you click this option prior to requesting a histogram, a warning will flash indicating that there will be no output. The Show normal curve on histogram allows a normal curve to be superimposed over the histogram.
6.5.4 Percentiles and Descriptives
Descriptive statistics are explained in detail in Chapter 7. Using the Frequencies command, under the Statistics option (see Screen 6.1), descriptive statistics and percentile values are available. When you click on the Statistics option, a new screen appears (Screen 6.5, below) that allows access to this additional information. Three different step sequences (on the following page) will explain (a) how to create a histogram and access descriptive data, (b) how to calculate a series of percentiles with equal spacing between each value, and (c) how to access specific numeric percentiles. All three sequences will utilize the total points variable.
Screen 6.5 The Frequencies: Statistics Window
For any of the three procedures below, the starting point is Screen 6.1 . Perform whichever of Steps 1–4 (pages 102–103) are necessary to arrive at this screen. Step 5c gives steps to create a histogram for total points and also requests the mean of the distribution, the standard deviation, the skewness, and the kurtosis. Click the Reset button before beginning if necessary.
To calculate percentiles of the total variable for every 5th percentile value (e.g., 5th, 10th, 15th, etc.) it’s necessary to divide the percentile scale into 20 equal parts. Click Reset if necessary.
Note that when you type the 20 (or any number) it automatically writes over the default value of 10, already showing.
Finally, to designate particular percentile values (in this case, 2, 16, 50, 84, 98) perform the following sequence of steps. Click Reset if necessary.
Note: Quartile values (the 25th, 50th, and 75th percentiles) may be obtained quickly by clicking the Quartiles box (see Screen 6.5), clicking Continue, and then clicking OK (see Screen 6.1).
Results of the analysis (or analyses) that have just been conducted require a window that displays the standard commands (File Edit Data Transform Analyze …) across the top. A typical print procedure is shown below beginning with the standard output screen (Screen 1, inside back cover).
To print results, from the Output screen perform the following sequence of steps:
To exit you may begin from any screen that shows the File command at the top.
Note: After clicking Exit, there will frequently be small windows that appear asking if you wish to save or change anything. Simply click each appropriate response.
Frequencies, Histograms, Descriptives, and Percentiles
In the output, due to space constraints, we often present results of analyses in a more space-conserving format than is typically done by SPSS. We use identical terminology as that used in the SPSS output and hope that minor formatting differences do not detract from understanding.
6.7.1 Frequencies
What follows is partial results (and a slightly different format) from sequence Step 5, page 103.
The number of subjects in each category is self-explanatory. Definitions of other terms follow:
Term | Definition/Description |
Value label | Names for levels of a variable. |
Value | The number associated with each level of the variable (just in front of each label). |
Frequency | Number of data points for a variable or level. |
Percent | The percent for each component part, including missing values. If there were missing values, they would be listed in the last row as missing along with the frequency and percent of missing values. The total would still sum to 100.0%. |
Valid percent | Percent of each value excluding missing values. |
Cum percent | Cumulative percentage of the Valid percent. |
6.7.2 Histograms
What follows is output from sequence Step 5b on page 106. To produce an identical graph you will need to perform the edits described on pages 96–97.
Note that on the horizontal axis (graph below) the border values of each of the bars are indicated. This makes for clear interpretation since it is easy to identify that, for instance, 11 students scored between 90 and 95 points, 20 students scored between 95 and 100 points, and 8 students scored between 100 and 105 points. The graph has been edited to create the 5-point increments for bars. For creation of an identical graph several of the editing options would need to be applied. Please see Chapter 5 to assist you with this. A normal curve is superimposed on the graph due to selecting the Show normal curve on histogram option.
6.7.3 Descriptives and Percentiles
What follows is complete output (slightly different format) from sequence Step 5d on page 107.
Descriptives
Percentiles
For Percentiles: For the total points variable, 5% of values fall below 70 points and 95% of values are higher than 70 points; 10% of values fall below 79.6 points and 90% are higher, and so forth.
Descriptive information is covered in Chapter 7 so we will not discuss those terms here. Note that when the skewness and kurtosis are requested, the standard errors of those two measures are also included.
Answers to selected exercises can be downloaded at www.spss-step-by-step.net .
Notice that data files other than the grades.sav file are being used here. Please refer to the Data Files section starting on page 364 to acquire all necessary information about these files and the meaning of the variables. As a reminder, all data files are downloadable from the web address shown above.
1. Using the divorce.sav file display frequencies for sex, ethnic, and status. Print output to show frequencies for all three; edit output so it fits on one page. On a second page, include three bar graphs of these data and provide labels to clarify what each one means.
2. Using the graduate.sav file display frequencies for motive, stable, and hostile. Print output to show frequencies for all three; edit output so it fits on one page. Note: This type of procedure is typically done to check for accuracy of data. Motivation (motive), emotional stability (stable), and hostility (hostile) are scored on 1- to 9-point scales. You are checking to see if you have, by mistake, entered any 0s or 99s.
3. Using the helping3.sav file compute percentiles for thelplnz (time helping, measured in z scores) and tqualitz (quality of help measured in z scores). Use percentile values 2, 16, 50, 84, 98. Print output and circle values associated with percentiles for thelplnz; box percentile values for tqualitz. Edit output so it fits on one page.
4. Using the helping3.sav file compute percentiles for age. Compute every 10th percentile (10, 20, 30, etc.). Edit (if necessary) to fit on one page.
5. Using the graduate.sav file display frequencies for gpa, areagpa, and grequant. Compute quartiles for these three variables. Edit (if necessary) to fit on one page.
6. Using the grades.sav file create a histogram for final. Include the normal curve option. Create a title for the graph that makes clear what is being measured. Perform the edits on pages 96–97 so the borders for each bar are clear.
Chapter 7 Descriptive Statistics
Descriptives is another frequently used SPSS procedure. Descriptive statistics are designed to give you information about the distributions of your variables. Within this broad category are measures of central tendency (Mean, Median, Mode), measures of variability around the mean (Std deviation and Variance), measures of deviation from normality (Skewness and Kurtosis), information concerning the spread of the distribution (Maximum, Minimum, and Range), and information about the stability or sampling error of certain measures, including standard error (S.E.) of the mean (S.E. mean), S.E. of the kurtosis, and S.E. of the skewness (included by default when skewness and kurtosis are requested). Using the Descriptives command, it is possible to access all of these statistics or any subset of them. In this introductory section of the chapter, we begin with a brief description of statistical significance (included in all forms of data analysis) and the normal distribution (because most statistical procedures require normally distributed data). Then each of the statistics identified above is briefly described and illustrated.
All procedures in the chapters that follow involve testing the significance of the results of each analysis. Although statistical significance is not employed in the present chapter it was thought desirable to cover the concept of statistical significance (and normal distributions in the section that follows) early in the book.
Significance is typically designated with words such as “significance,” “statistical significance,” or “probability.” The latter word is the source of the letter that represents significance, the letter “p.” The p value identifies the likelihood that a particular outcome may have occurred by chance. For instance, group A may score an average of 37 on a scale of depression while group B scores 41 on the same scale. If a t test determines that group A differs from group B at a p = .01 level of significance, it may be concluded that there is a 1 in 100 probability that the resulting difference happened by chance, and a 99 in 100 probability that the discrepancy in scores is a reliable finding.
Regardless of the type of analysis the p value identifies the likelihood that a particular outcome occurs by chance. A Chi-square analysis identifies whether observed values differ significantly from expected values; a t test or ANOVA identifies whether the mean of one group differs significantly from the mean of another group or groups; correlations and regressions identify whether two or more variables are significantly related to each other. In all instances a significance value will be calculated identifying the likelihood that a particular outcome is or is not reliable. Within the context of research in the social sciences, nothing is ever “proved.” It is demonstrated or supported at a certain level of likelihood or significance. The smaller the p value, the greater the likelihood that the findings are valid.
Social scientists have generally accepted that if the p value is less than .05 then the result is considered statistically significant. Thus, when there is less than a 1 in 20 probability that a certain outcome occurred by chance, then that result is considered statistically significant. Another frequently observed convention is that when a significance level falls between .05 and .10, the result is considered marginally significant. When the significance level falls far below .05 (e.g., .001, .0001, etc.) the smaller the value, the greater confidence the researcher has that his or her findings are valid.
When one writes up the findings of a particular study, certain statistical information and p values are always included. Whether or not a significant result has occurred is the key focus of most studies that involve statistics.
Many naturally occurring phenomena produce distributions of data that approximate a normal distribution. Some examples include the height of adult humans in the world, the weight of collie dogs, the scoring averages of players in the NBA, and the IQs of residents of the United States. In all of these distributions, there are many mid-range values (e.g., 60–70 inches, 22–28 pounds, 9–14 points, 90–110 IQ points) and few extreme values (e.g., 30 inches, 80 pounds, 60 points, 12 IQ points). There are other distributions that approximate normality but deviate in predictable ways. For instance, times of runners in a 10-kilometer race will have few values less than 30 minutes (none less than 26:17), but many values greater than 40 minutes. The majority of values will lie above the mean (average) value. This is called a negatively skewed distribution. Then there is the distribution of ages of persons living in the United States. While there are individuals who are 1 year old and others who are 100 years old, there are far more 1-year-olds, and in general the population has more values below the mean than above the mean. This is called a positively skewed distribution. It is possible for distributions to deviate from normality in other ways, some of which are described in this chapter.
A normal distribution is symmetric about the mean or average value. In a normal distribution, 68% of values will lie between plus-or-minus (±) 1 standard deviation (described below) of the mean, 95.5% of values will lie between ± 2 standard deviations of the mean, and 99.7% of values will lie between ± 3 standard deviations of the mean. A normal distribution is illustrated in the figure below.
A final example will complete this section. The average (or mean) height of an American adult male is 69 inches (5′ 9″) with a standard deviation of 4 inches. Thus, 68% of American men are between 5′ 5″ and 6′ 1″ (69 ± 4); 95.5% of American men are between 5′ 1″ and 6′ 5″ (69 ± 8), and 99.7% of American men are between 4′ 9″ and 6′9″ (69 ± 12) in height (don’t let the NBA fool you!).
7.3 Measures of Central Tendency
The Mean is the average value of the distribution, or, the sum of all values divided by the number of values. The mean of the distribution [3 5 7 5 6 8 9] is:
(3 + 5 + 7 + 5 + 6 + 8 + 9)/7 = 6.14
The Median is the middle value of the distribution. The median of the distribution [3 5 7 5 6 8 9], is 6, the middle value (when reordered from small to large, 3 5 5 6 7 8 9). If there is an even number of values in a distribution, then there will be two middle values. In that case the average of those two values is the median.
The Mode is the most frequently occurring value. The mode of the distribution [3 5 7 5 6 8 9] is 5, because 5 occurs most frequently (twice, all other values occur only once).
7.4 Measures of Variability Around the Mean
The Variance is the sum of squared deviations from the mean divided by N − 1. The variance for the distribution [3 5 7 5 6 8 9] (the same numbers used above to illustrate the mean) is:
[(3–6.14)2 + (5–6.14)2 + (7–6.14)2 + (5–6.14)2 + (6–6.14)2 + (8–6.14)2 + (9–6.14)2]/6 = 4.1429
Variance is used mainly for computational purposes. Standard deviation is the more commonly used measure of variability.
The Standard deviation is the positive square root of the variance. For the distribution [3 5 7 5 6 8 9], the standard deviation is the square root of 4.1429, or 2.0354.
7.5 Measures of Deviation from Normality
Kurtosis is a measure of the “peakedness” or the “flatness” of a distribution. A kurtosis value near zero (0) indicates a shape close to normal. A positive value for the kurtosis indicates a distribution more peaked than normal. A negative kurtosis indicates a shape flatter than normal. An extreme negative kurtosis (e.g., < −5.0) indicates a distribution where more of the values are in the tails of the distribution than around the mean. A kurtosis value between ±1.0 is considered excellent for most psychometric purposes, but a value between ±2.0 is in many cases also acceptable, depending on the particular application. Remember that these values are only guidelines. In other settings different criteria may arise, such as significant deviation from normality (outside ±2 × the standard error). Similar rules apply to skewness.
Skewness measures to what extent a distribution of values deviates from symmetry around the mean. A value of zero (0) represents a symmetric or evenly balanced distribution. A positive skewness indicates a greater number of smaller values (sounds backward, but this is correct). A negative skewness indicates a greater number of larger values. As with kurtosis, a skewness value between ±1.0 is considered excellent for most psychometric purposes, but a value between ±2.0 is in many cases also acceptable, depending on your application.
7.6 Measures for Size of the Distribution
For the distribution [3 5 7 5 6 8 9], the Maximum value is 9, the Minimum value is 3, and the Range is 9 − 3 = 6. The Sum of the scores is 3 + 5 + 7 + 5 + 6 + 8 + 9 = 43.
7.7 Measures of Stability: Standard Error
SPSS computes the Standard errors for the mean, the kurtosis, and the skewness. Standard error is designed to be a measure of stability or of sampling error. The logic behind standard error is this: If you take a random sample from a population, you can compute the mean, a single number. If you take another sample of the same size from the same population you can again compute the mean—a number likely to be slightly different from the first number. If you collect many such samples, the standard error of the mean is the standard deviation of this sampling distribution of means. A similar logic is behind the computation of standard error for kurtosis or skewness. A small value (what is “small” depends on the nature of your distribution) indicates greater stability or smaller sampling error.
The file we use to illustrate the Descriptives command is our example described in the first chapter. The data file is called grades.sav and has an N = 105. This analysis computes descriptive statistics for variables gpa, total, final, and percent.
7.8.1 Descriptives
To access the initial SPSS screen from the Windows display, perform the following sequence of steps:
Mac users: To access the initial SPSS screen, successively click the following icons:
After clicking the SPSS program icon, Screen 1 appears on the monitor.
Step 2 | |
Create and name a data file or edit (if necessary) an already existing file (see Chapter 3 ). |
Screens 1 and 2 (displayed on the inside front cover) allow you to access the data file used in conducting the analysis of interest. The following sequence accesses the grades.sav file for further analyses:
Whether first entering SPSS or returning from earlier operations the standard menu of commands across the top is required. As long as it is visible you may perform any analyses. It is not necessary for the data window to be visible.
After completion of Step 3 a screen with the desired menu bar appears. When you click a command (from the menu bar), a series of options will appear (usually) below the selected command. With each new set of options, click the desired item. The sequence to access Descriptive Statistics begins at any screen with the menu of commands visible:
A new screen now appears (below) that allows you to select variables for which you wish to compute descriptives. The procedure involves clicking the desired variable name in the box to the left and then pasting it into the Variable(s) (or “active“) box to the right by clicking the right arrow () in the middle of the screen. If the desired variable is not visible, use the scroll bar arrows () to bring it to view. To deselect a variable (that is, to move it from the Variable(s) box back to the original list), click on the variable in the active box and the in the center will become a . Click on the left arrow to move the variable back. To clear all variables from the active box, click the Reset button.
Screen 7.1 The Descriptives Window
The only check box on the initial screen, Save standardized values as variables, will convert all designated variables (those in the Variable(s) box) to z scores. The original variables will remain, but new variables with a “z” attached to the front will be included in the list of variables. For instance, if you click the Save standardized values as variables option, and the variable finalwas in the Variable(s) box, it would be listed in two ways: final in the original scale and zfinal for the same variable converted to z scores. You may then do analyses with either the original variable or the variable converted to z scores. Recall that z scores are values that have been mathematically transposed to create a distribution with a mean of zero and a standard deviation of one. See the glossary for a more complete definition. Also note that for non-mouse users, the SPSS people have cleverly underlined the “z” in the word “standardized” as a gentle reminder that standardized scores and z scores are the same thing.
To create a table of the default descriptives (mean, standard deviation, maximum, minimum) for the variables gpa and total, perform the following sequence of steps:
If you wish to calculate more than the four default statistics, after selecting the desired variables, before clicking the OK, it is necessary to click the Options button (at the bottom of Screen 7.1). Here every descriptive statistic presented earlier in this chapter is included with a couple of exceptions: Median and mode are accessed through the Frequencies command only. See Chapter 6 to determine how to access these values. Also, the standard errors (“S.E.“) of the kurtosis and skewness are not included. This is because when you click either kurtosis or skewness, the standard errors of those values are automatically included. To select the desired descriptive statistics, the procedure is simply to click (so as to leave a in the box to the left of the desired value) the descriptive statistics you wish. This is followed by a click of Continue and OK. The Display order options include (a) Variable list (the default—in the same order as displayed in the data editor), (b) Alphabetic (names of variables ordered alphabetically), (c) Ascending means (ordered from smallest mean value to largest mean value in the output), and (d) Descending means (from largest to smallest).
Screen 7.2 The Descriptives: Options Window
To select the variables final, percent, gpa, and total, and then select all desired descriptive statistics, and perform the following sequence of steps. Press the Reset button if there are undesired variables in the active box.
Upon completion of either Step 5 or Step 5a, Screen 7.3 will appear (below). The results of the just-completed analysis are included in the top window labeled Output#[Document#] – IBM SPSS Statistics Viewer. Click on the to the right of this title if you wish the output to fill the entire screen and then make use of the arrows on the scroll bar () to view the results. Even when viewing output, the standard menu of commands is still listed across the top of the window. Further analyses may be conducted without returning to the data screen. Partial output from this analysis is included in the Output section.
Screen 7.3 SPSS Output Viewer Window
Results of the analysis (or analyses) that have just been conducted require a window that displays the standard commands (File Edit Data Transform Analyze …) across the top. A typical print procedure is shown in the following page beginning with the standard output screen (Screen 1, inside back cover).
To print results, from the Output screen perform the following sequence of steps:
To exit you may begin from any screen that shows the File command at the top.
Note: After clicking Exit, there will frequently be small windows that appear asking if you wish to save or change anything. Simply click each appropriate response.
7.10.1 Descriptive Statistics
What follows is output from sequence Step 5a, page 118. Notice that the statistics requested include the N, the Mean, the Standard Deviation, the Variance, the Skewness, and the Kurtosis. The Standard Errors of the Skewness and Kurtosis are included by default.
IBM SPSS Statistics: Descriptive Statistics
First observe that in this display the entire output fits neatly onto a single page or is entirely visible on the screen. This is rarely the case. When more extensive output is produced, make use of the up, down, left, and right scroll bar arrows to move to the desired place. You may also use the index in the left window to move to particular output more quickly. Notice that all four variables fall within the “excellent” range as acceptable variables for further analyses; the skewness and kurtosis values all lie between ±1.0. All terms are identified and described in the introductory portion of the chapter. The only undefined word is listwise. This means that any subject that has a missing value for any variable has been deleted from the analysis. Since in the grades.sav file there are no missing values, all 105 subjects are included.
Answers to selected exercises may be downloaded at www.spss-step-by-step.net .
Notice that data files other than the grades.sav file are being used here. Please refer to the Data Files section starting on page 362 to acquire all necessary information about these files and the meaning of the variables. As a reminder, all data files are downloadable from the web address shown above.
1. Using the grades.sav file select all variables except lastname, firstname, grade, and passfail. Compute descriptive statistics, including mean, standard deviation, kurtosis, and skewness. Edit so that you eliminate Std. Error (Kurtosis) and Std. Error (Skewness) making your chart easier to interpret. Edit the output to fit on one page.
· Draw a line through any variable for which descriptives are meaningless (either they are categorical or they are known to not be normally distributed).
· Place an “*” next to variables that are in the ideal range for both skewness and kurtosis.
· Place an X next to variables that are acceptable but not excellent.
· Place a ψ next to any variables that are not acceptable for further analysis.
2. Using the divorce.sav file select all variables except the indicators (for spirituality, sp8–sp57, for cognitive coping, cc1–cc11, for behavioral coping, bc1–bc12, for avoidant coping, ac1–ac7, and for physical closeness, pc1–pc10). Compute descriptive statistics, including mean, standard deviation, kurtosis, and skewness. Edit so that you eliminate Std. Error (Kurtosis) and Std. Error (Skewness) and your chart is easier to interpret. Edit the output to fit on two pages.
· Draw a line through any variable for which descriptives are meaningless (either they are categorical or they are known to not be normally distributed).
· Place an “*” next to variables that are in the ideal range for both skewness and kurtosis.
· Place an X next to variables that are acceptable but not excellent.
· Place a ψ next to any variables that are not acceptable for further analysis.
3. Create a practice data file that contains the following variables and values:
Compute: the mean, the standard deviation, and variance and print out on a single page.