# UNDERSTANDING STATISTICAL CONCEPTS 1

## Running head: UNDERSTANDING STATISTICAL CONCEPTS 1

UNDERSTANDING STATISTICAL CONCEPTS 22

## Final Exam

Focus of the Final Exam

The purpose of the Final Exam is to assess your understanding of the main statistical concepts covered in this course and to evaluate your ability to critically review a quantitative research article.  The exam will consist of two parts: Part I includes three essay questions and Part II includes a research critique.  All of your responses should be included in a single Word document for submission.

Please include the following general headings for each section of the written exam within your Word document: Part I: Essay Questions

1. Essay 1

2. Essay 2

3. Essay 3

Part II: Research Study Critique

1.   Introduction

2.   Methods

3.   Results

4.   Discussion

Your complete Word document must include a title page with the following:

1. Student’s name

2. Course name and number

3. Instructor’s name

4. Date submitted

Part I: Essay Questions There are three essay questions in this section.  You must answer all three questions.  The length of each essay should be one to two double-spaced pages (excluding title and reference pages).  Use 12-point font and format your paper with regular 1-inch margins.  Do not include the essay prompt in your document.  It will not count toward the length requirement for your essays.

Essay 1 A group of researchers conducted an experiment to determine which vaccine is more effective for preventing getting the flu. They tested two different types of vaccines: a shot and a nasal spray. To test the effectiveness, 1000 participants were randomly selected with 500 people getting the shot and 500 the nasal spray. Of the 500 people were treated with the shot, 80 developed the flu and 420 did not. Of the people who were treated with the nasal spray, 120 people developed the flu and 380 did not. The level of significance was set at .05. The proportion of people who were treated with the shot who developed the flu = .16, and the proportion of the people who were treated with the nasal spray was .24. The calculated p value = .0008.  For this essay, describe the statistical approaches (e.g., identify the hypotheses and research methods) used in this excerpt from a research study.  Interpret the statistical results and examine the limitations of the statistical methods.  Finally, evaluate the research study as a whole and apply what you have learned about hypothesis testing and inferential statistics by discussing how you might conduct a follow-up study.

· Describe the research question for this experiment.

· What were the null and alternative hypotheses?

· Were the results of this test statistically significant?

· If so, why were they significant?

· Would the researchers reject or fail to reject the null hypothesis?

· Do the results provide sufficient evidence to support the alternative hypothesis?

· What are some possible limitations to this study?

· Describe the difference between practical and statistical significance.

Essay 2 A researcher has investigated the relationship between IQ and grade point average (GPA) and found the correlation to be .75.

For this essay, critique the results and interpretation of a correlational study.

· Evaluate the correlational result and identify the strength of the correlation.

·  Examine the assumptions and limitations of the possible connection between the researcher’s chosen variables.

· Identify and describe other statistical tests that could be used to study this relationship.

· How strong is this correlation?

· Is this a positive or negative correlation?

· What does this correlation mean?

· Does this correlation imply that individuals with high Intelligence Quotients (IQ) have high Grade Point Averages (GPA)?

· Does this correlation provide evidence that high IQ causes GPA to go higher?

· What other variables might be influencing this relationship?

· What is the connection between correlation and causation?

· What are some of the factors that affect the size of this correlation?

· Is correlation a good test for predicting GPA?

· If not, what statistical tests should a researcher use, and why?

Essay 3 A researcher has recorded the reaction times of 20 individuals on a memory assessment. The following table indicates the individual times:

 2.2 4.7 7.3 4.1 9.5 15.2 4.3 9.5 2.7 3.1 9.2 2.9 8.2 7.6 3.5 2.5 9.3 4.8 8.5 8.1

In this essay, demonstrate your ability to organize data into meaningful sets, calculate basic descriptive statistics, interpret the results, and evaluate the effects of outliers and changes in the variables.  You may use Excel, one of the many free online descriptive statistics calculators, or calculate the values by hand and/or with a calculator.

Next, separate the data into two groups of 10; one group will be the lower reaction times, and the second group will be the higher reaction times.  Then, address the following points in your essay response:

· Calculate the sum, mean, mode, median, standard deviation, range, skew, and kurtosis for each group.

· How do the two groups differ?

· Are there any outliers in either data group?

· What effect does an outlier have on a sample?

Lastly, double each sample by repeating the same 10 data points in each group.  You will have a total of 20 data points for each group. After completing this, address the following in your essay response:

· Calculate the following for the new data groups: sum, mean, mode, median, standard deviation, range, skew, and kurtosis.

· Did any of the values change?

· How does sample size affect those values?

Part B: Research Study Critique In this second portion of the Final Exam, you will identify and critically evaluate a quantitative research article based on a social science topic.  Your selected article must include a research question(s) and/or hypothesis(es) and utilize statistical analyses covered in the course. The article must be peer-reviewed and published within the last 10 years.

In the body of your critique, describe the statistical approaches used, the variables included, the hypothesis(es) proposed, and the interpretation of the results.   In your conclusion, suggest other statistical approaches that could have been used and, if appropriate, suggest alternative interpretations of the results.  This process will allow you to apply the concepts learned throughout the course in the interpretation of actual scientific research. Your critique must include the following sections:

1. Introduction: This section will include a general introduction of the quantitative study from a peer-reviewed source published within the last 10 years.  The research questions and/or hypothesis(es) as well as the purpose of the study should be clearly defined.

2. Methods: Describe and evaluate the procedures and methods of data collection, measures/instruments used, the participants and how they were selected, and the statistical techniques used.

3. Results: In this section describe and critique the results presented in the study.

4. Discussion: Discuss and evaluate the efficacy of the results presented in the study.  Address, the strengths, weaknesses, and limitations of the study, and suggest future research directions.  Include additional forms of statistical analyses as part of the suggestions for future research.

The Research Study Critique:

1. Must be three to four double-spaced pages in length (excluding title and reference pages) and formatted according to APA style as outlined in the Ashford Writing Center.  Use 12-point font, with 1-inch margins.

2. Must use the sections and headings described above.

3. Must address the article with critical thought by examining, reflecting, and evaluating the article from an objective viewpoint and by using facts to support your argument. Refer to the Critical Thinking Community (Links to an external site.)Links to an external site. website for further assistance.

4. Must end with a conclusion that summarizes your critical evaluation.

5. Must use one quantitative research study from a peer-reviewed source that was published within the last 10 years.

6. Must properly cite the source article in APA style as outlined in the Ashford Writing Center.

7. Must include a separate reference page, formatted according to APA style as outlined in the Ashford Writing Center.

Week Five Final Project: Understanding Statistical Concepts and Applications

Introduction …………………………………………………………………………………………………………………….3

Part I: Essay Questions

Essay One …………………………………………………………………………………………………………3, 4

Essay Two ………………………………………………………………………………………………………..5, 6

Essay Three …………………………………………………………………………………………………..6, 7, 8

Part II: Research Study Critique

Introduction …………………………………………………………………………………………………………8

Methods …………………………………………………………………………………………………………..8, 9

Results ……………………………………………………………………………………………………………9, 10

Discussion ………………………………………………………………………………………………………….11

Conclusion …………………………………………………………………………………………………………………..11

References ……………………………………………………………………………………………………………………12

Appendices

Appendix A Essay Three Data Set ……………………………………………………………………….13

Appendix B Essay Three Data Set Group A ………………………………………………………….14

Appendix C Essay Three Data Set Group B ………………………………………………………….15

Appendix D Essay Three Data Set Group A Doubled …………………………………………….16

Appendix E Essay Three Data Set Group B Doubled ……………………………………………..17

Week Five Final Project: Understanding Statistical Concepts and Applications

Many psychology students are surprised to find that mathematical and statistical courses are required to obtain the degrees needed to become a helping professional. They often question the necessity or relevance of such courses and pose the argument that understanding human behavior and treating mental illness have little or nothing to do with statistics save for work in research. Students holding this position are frequently surprised to learn that all areas of psychology from industrial to practical relate to statistics in significant ways. In fact, statistics have been instrumental in unlocking many of the mysteries of human psychology. Through the important area of research, statistics help mental health care providers understand the causes, treatments, and prevalence of various illnesses both common and rare. Within this paper, the integral part statistical analysis plays in the field of psychology will be explored and the different methods of statistical analysis will be identified and discussed.

Part I: Essay Questions

Essay One

Researchers conducted an experimental study to determine if a particular flu prevention vaccine was more effective when administered in nasal spray form or injection form. The question to be answered was whether the method of delivery of the flu vaccination had an impact on the efficacy of the drug. The null hypothesis (H0: n1 = n2) is that no differences exist between the two methods of delivery and the alternative hypothesis (HA: n1 ≠ n2) would be that a difference exists. The study involved a sample of 1000 randomly chosen individuals (= 1000) sorted into two groups; group one received the drug via nasal spray (n1 = 500) and the second group received the drug by way of injection (n2 = 500). The level of significance was established to be .05 and the calculated p-value was .0008. The results of the study showed that out of the 500 participants who received the nasal spray 120 contracted the flu while 420 did not (n1 = .24). In the group of individuals who received the drug via injection, 80 participants developed the flu while 420 did not (n2 = .16). In this example the calculated p-value of .0008 is less than the established level of confidence which is .05 which suggests that the results are statistically significant resulting in the rejection of the null hypothesis. In order for the null hypothesis to be correct, these results would occur in only 8 out of every 10,000 trials which is very unlikely.

The sample size of 1000 participants is adequate and will likely produce results that are a reliable representation of the entire population. It is important to note that a very limited amount of information was provided in the assignment prompt which makes assessing the validity of the study and the findings produced very challenging. I assume that the sample was selected randomly from a pool of participants with a high level of homogeneity (similar ages, comparable health, etc.) and feel confident that this study and its findings were valid with relatively few limitations. One limitation was that no control group was included within the study. The inclusion of a placebo-control group helps researchers avoid common errors such as placebo-effect or observer error from skewing the findings (Tanner, 2011). Additionally, the study did not specify how participants were exposed to the flu virus, whether naturally by way of their regular routines or through intentional lab-controlled exposure. This information could be important as strains vary in aggressiveness, severity of symptoms, and ease of transmission (Center for Disease Control, 2015). Finally, this study was limited in that it could not easily be conducted as a double-blind or even blind study. To render this study blind or double-blind in follow-up studies, the participants would have to receive both nasal spray and a shot; one containing the drug and the other being a placebo.

Essay Two

In this scenario a researcher might be interested in establishing a definitive link between high intelligence quotient (I.Q.) and a high grade point average (GPA). To accomplish this, the researcher would collect interval data such as scores from I.Q tests and G.P.A records and implement a method such as Pearson’s Correlation Coefficient to produce a value that can be used to identify patterns of linear regression. After confirming that the samples and data met the requirements for eligibility such as a linear relationship and normal distribution, the researcher found a correlation value of .75 which according to Tanner (2011) is considerably high. Correlation values of −1.0 or +1.0 (= -1 and r = 1, respectively) are indicative of a perfect relationship and a correlation value of “0” (= 0) indicates absolutely no relationship. Should the researcher wish to create a visual representation of the relationship between I.Q. and G.P.A he could enter the data into a scatterplot with one set of data on the y axis and the other on the axis. With a correlation value of .75 a clear picture of the strong positive relationship would appear with the two variables mainly moving together from the lower left of the plot to the upper right. Conversely, if the relationship had been negative the pattern would move from the upper left of the plot to the lower right. Generally speaking, the less “scatter” in the plot, the stronger the relationship regardless of whether the values are positive or negative (Tanner, 2011).

Despite the strong correlation the researcher has established in this example, it is important to keep in mind that correlation alone is not sufficient evidence of causation. While it is true that students with higher I.Q. levels often have high grade point averages, there are other factors believed to contribute to the relationship including the availability of educational resources such as tutoring, strong organization skills, diet, stress levels, and adequate parental supervision and guidance (Bernard, Nalbone, Hecker, & Degges-White, 2015). In fact, correlation simply implies that two variables are related in some way. To effectively determine causation, one would need to isolate the related variables through methods such as randomized controlled trials (RCTs) which greatly reduce the possibility of skewed findings due to confounding variables. The identification of a strong correlating relationship between high I.Q. and high G.P.A. may not be enough to establish causation but there are plenty of beneficial ways that the relationship can be used. For example, it could be used as a platform on which to build effective after-school programs that offer nutritional information and resources, tutoring services, and parent-student study sessions to economically-disadvantaged students.

Essay Three

In this section of my final exam I will provide the data set for each group and fulfil the requirements of the essay prompts as a means of demonstrating both my comprehension of the concepts and terms used and my ability to effectively organize data and interpret results. The information provided in this section can also be found in the appendices at the end of the paper. A researcher has compiled a list of reaction times for 20 individuals who have completed a memory-related assessment test. This is the data set produced: (See Appendix A)

2.2, 4.7, 7.3, 4.1, 9.5, 15.2, 4.3, 9.5, 2.7, 3.1, 9.2, 2.9, 8.2, 7.6, 3.5, 2.5, 9.3, 4.8, 8.5, 8.1

The sum which is the total of the entire data set is ∑= 127.2

The data is to be divided into two groups of 10; group A contains the lowest 10 reaction times (See Appendix B) and group B contains the 10 highest reaction times (See Appendix C).

The sum of group A is 34.8 and for group B the sum is 92.4. The mean, which is the sum of the data set divided by the number of items in the set, for group A is = 3.48 and for group B is M = 9.24. The mode, which is the value that appears most frequently within the set, for group A is Mo = 2.2, 2.5, 2.7, 2.9, 3.1, 3.5, 4.1, 4.3, 4.8, 4.7 and for group B the mode is Mo = 9.5.

The median represents the middle score when a set of data is arranged in order. The median for group A is Mdn = 3.3 and for group B the median is Mdn = 8.85. The standard deviation of a data set is the square root of the variance, for group A the standard deviation is s = 0.94 and for group B it is s = 2.2. The range for group A is r = 2.6 and for group B is r = 7.9. Range marks the difference between the highest and lowest values within the set. Skewness refers to the symmetry within the set and for group A is sk = 0.15 while group B skew is sk = 1.95. Finally, kurtosis refers to how spread out the data in a set are. The kurtosis of group A is kur = 1.4 and for group B is kur = 5.7.

The descriptive values of each data set vary significantly. There are several reasons for these variations which include factors such as outliers. Outliers are extreme scores that stand out from the rest of the scores in a data set. For example, in group A the mode consisted of every number in the data set because they each appeared a single time within the set but in group B the mode was simply 9.5 because this number appeared twice within the set. The sums of each set where significantly different from one another because one set was comprised of the lowest values of the original set and the other group contained all the greatest values.

Next the values of group A (See Appendix D) and group B (See Appendix E) were doubled and produced the following results: The sum for group A is now 60 and for group B is 185. The mean for group A is = 3.3 and M = 9.24 for group B. The mode values for both groups remained the same with group A being Mo = 2.2, 2.5, 2.7, 2.9, 3.1, 3.5, 4.1, 4.3, 4.8, 4.7 and group B at Mo = 9.5. The median for group A is Mdn = 3.1 and for group B is Mdn = 8.85. The group A standard deviation is s = .084 and s = 2.18 for group B. Range for group A is = 2.5 and r = 7.9 for group B. After doubling, the skew for group A was sk = 0.30 and for group B was sk = 2. Kurtosis value for group A became kur = 1.64 and for group B kur = 6.02. Upon doubling the differences in values became more obvious.

Part II: Research Study Critique

The article I will be critiquing for the second part of my final exam is an article from 2010 titled “Parenthood: A Contributing Factor to Childhood Obesity”.

Introduction

This article is in response to data from 2004 by The National Health and Nutrition Examination Survey (NHANES) which suggested that national obesity rates among children increased from 6.5% in 1980 to 17.1% in 2004. The authors of the selected article analyzed the information provided by the NHANES to identify specific predictors which might play a pivotal role in the rising childhood obesity epidemic. Variables such as race, physical inactivity, economic standing, single and double parent households, and parental involvement were considered and analyzed. The secondary objective of the authors was to assess differences in dietary and blood cholesterol of children from single parent versus dual-parent households. They hypothesized (H1) that in a national population, children from single parent households would have higher obesity rates than children from dual-parent households. The null hypothesis (H0) would be that no differences in obesity rates between children from single and dual parent households would be observed.

Methods

The authors believed the data provided by the NHANES was ideal for their study as it matched the information of children to their parents and included a large sample population of 1,000 participants between the ages of 6-11 years. The 1000 children came from 219 households with single-parents (N = 219) and 780 with two-parents (N = 780). Gender distribution between the two groups was equal. The physical variables included in analysis for the children were age, gender, race, height, weight, and body mass index (BMI). Obesity was determined according to the guidelines provided by the Center for Disease Control and Prevention (CDC) which categorizes a child as obese when they fall above the 95th percentile for their age, height, and weight. Recall data of dietary intake for the previous 24 hour period were collected and included the consumption of total energy, carbohydrates, proteins, total fat, saturated fat, and mono/poly unsaturated fatty acids (Huffman, Kanikireddy, & Patel, 2010). The variables analyzed for the parents included gender, marital status, level of education, and annual family income.

Results

Initial findings showed that children from single parent households were significantly more overweight (p < 0.01) than children from dual-parent households (p < 0.05), and that black children were heavier than children from all other racial backgrounds (p < 0.04). These preliminary findings suggested to researchers that their hypothesis was correct. To investigate these relationships further, researchers randomly selected a sample population from the NHANES study and asked them to participate on a smaller scale study. Following the original study design used in the NHANES survey, researchers also included in-depth interviews of each participant and the parent or parents with whom they lived. In the follow-up study researchers used BMI as the primary measurement of obesity when comparing children from single and dual-parent households. Then a one-way analysis of variance (ANOVA) was performed to compare the BMI status of children from different racial groups living in single-parent households. To test the association of obesity and parental status, logistic regression models were performed with obese versus not obese as the dependent variable (DV), single-parent household as the independent variable (IV) and age, income race and education as the covariates (CV). The final sampled weight and Taylor’s linearization methods were used for variance estimation. Statistical analyses were conducted and the database was imported for logistic regression analyses. For all methods of analysis, p < 0.05 was considered significant acids (Huffman, Kanikireddy, & Patel, 2010). Findings from the subsequent study showed that the proportion of overweight children from single-parent households (41%) was greater than children from dual-parent households (31%). Participants from dual-parent households had significantly (< 0.01) lower BMI (19.2 ± 5.4) than participants living in single-parent households (21.5 ± 6.5). Total calories consumed were marginally significant (< 0.06) between participants of single-parent households (1910 ± 24) and dual-parent households (1,860 ± 25). Mean low-density lipoprotein (LDL) level, the cholesterol considered “bad” by the medical and nutritional community was also higher among participants from single-parent household (91 ± 1.5) compared to participants from dual-parent households (88 ± 1.4; p < 0.05). Total fat and saturated fat intakes (g/day) were higher (< 0.05) for single-parent participants (9.1 ± 6.2; 3.3 ± 1.0) than for participants from two-parent households (8.6 ± 5.2; 3.2 ± 1.1). African-American youths were found to be more overweight and had higher (p < 0.04) BMI (20.4 ± 2.2) than White youths (19.2 ± 2.2) and youths from all other racial backgrounds (18.9 ± 3). Through the use of binary logistic regression models, researchers determined that the odds ratio (OR) for the likelihood of obesity for children in single-parent households was significantly higher (OR: 1.72 (1.24, 2.38), p = 0.001) than the OR for obesity in kids from dual-parent homes. Co-variables such as parental income, levels of education, race, and age were also used in the logistic regression models. Researchers believed the findings to support their original hypothesis.

Discussion

The findings of the smaller, subsequent study aligned with the findings from nearly every other study previously conducted regarding childhood obesity and household construct. One exception was a study in which findings suggested children from foster homes or living with grandparents had lower rates of obesity.

Overall I believe this study was conducted ethically with most of the important variables accounted for. Any limitations or flaws I noted were small and possibly insignificant but included the omission of information about how many households were single-parent due to a deceased second parent and the overall mental health of the parent(s) within the study. These variables could cause a slight skew in finding interpretations. Nevertheless, this study and its findings can be instrumental in the construction of programs designed to lower childhood obesity rates on a global scale. Doctors, therapists, and school administrators might also benefit from this study in the assessment and detection of at-risk children.

In conclusion, the importance of statistics within the social and behavioral sciences is far more significant than many people, mainly students, often recognize. Without the studies and research efforts of those in the field of statistical analysis, many helping professionals would not have the tools or knowledge to treat their patients and clients or assist them in learning to live a healthy, autonomous life. This course has been a challenge for me but I believe I have gained a basic knowledge of statistics and more importantly, a better understanding of how great a contribution statistics has made in the field I have chosen to dedicate my life.

References

Bernard, J. M., Nalbone, D. P., Hecker, L. L., & Degges-White, S. E. (2015). Co-Parenting factors that contribute to academic success. European Scientific Journal, 241. http://search.ebscohost.com.proxy-library.ashford.edu/login.aspx?direct=true&db=edsgao&AN=edsgcl.424568329&site=eds-live

Center for Disease Control and Prevention. (2015, October 2). Key facts about seasonal

Furey, E. (2016) Stem and leaf plot generator [Online calculator resource] Retrieved from http://www.CalculatorSoup.com

Huffman, F. G., Kanikireddy, S., & Patel, M. (2010). Parenthood: A contributing factor to childhood obesity. International Journal of Environmental Research and Public Health7(7), 2800-2810. doi:10.3390/ijerph7072800

Tanner, D. (2011). Statistics for the Behavioral & Social Sciences San Diego, CA: Bridgepoint Education, Inc.

Appendix A

Original Data Set/Essay 3

(Source: Furey, 2016)

Appendix B

Data Set/Essay 3 (Group A)

(Source: Furey, 2016)

Appendix C

Data Set/Essay 3 (Group B)

(Source: Furey, 2016)

Appendix D

Data Set/Essay 3 (Group A doubled)

(Source: Furey, 2016)

Appendix E

Data Set/Essay 3 (Group B doubled)

(Source: Furey, 2016)