Outcome | Observed | Expected |
---|---|---|
\(1\) | \(50,611\) | \(50,000\) |
\(2\) | \(49,523\) | \(50,000\) |
\(3\) | \(49,812\) | \(50,000\) |
\(4\) | \(49,924\) | \(50,000\) |
\(5\) | \(49,672\) | \(50,000\) |
\(6\) | \(50,458\) | \(50,000\) |
Total: | \(300,000\) | \(300,000\) |
\(H_0\): expect the die to form a uniform distribution. No side of the die is favored.
\(H_A\): the die does not follow a uniform distribution. Some sides seem to be favored.
we need to compute some statistic that summarizes how different our data is from the uniform data.
the statistic we will use is called chi-square(d)
once we have the statistic we calculate its p-value and compare it to the significance level
\[ \displaystyle \chi^2 = \sum \frac{ (\text{observed}- \text{expected})^2}{\text{expected}}\]
Outcome | Observed | Expected | \(\frac{ (\text{observed}- \text{expected})^2}{\text{expected}}\) |
---|---|---|---|
\(1\) | \(50,611\) | \(50,000\) | \(7.46642\) |
\(2\) | \(49,523\) | \(50,000\) | \(4.55058\) |
\(3\) | \(49,812\) | \(50,000\) | \(0.70688\) |
\(4\) | \(49,924\) | \(50,000\) | \(0.11552\) |
\(5\) | \(49,672\) | \(50,000\) | \(2.15168\) |
\(6\) | \(50,458\) | \(50,000\) | \(4.19528\) |
Total: | \(300,000\) | \(300,000\) | \(19.18636\) |
# our chi-square statistic is 19.18636
# we will put our chi-square statistic on the the chi-square distribution with the correct degrees of freedom
# the p-value will be the area to the right of the chi-square statistic
# for a goodness-of-fit test the degrees of freedom is the # of bins - 1 (in this case 6-1 = 5)
1-pchisq(19.186,5)
## [1] 0.001774658
# we obtain a p-value of .00177
# extremely rare, only .177% of all experiments with 300,000 dice rolls will be further from the uniform distribution than this data
# this data provides statistically significant evidence that the die does not follow a uniform distribution.
Example: Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?
Grades | Popular | Sports | Total | |
---|---|---|---|---|
\(4^{th}\) | \(63\) | \(31\) | \(25\) | \(119\) |
\(5^{th}\) | \(88\) | \(55\) | \(33\) | \(176\) |
\(6^{th}\) | \(96\) | \(55\) | \(32\) | \(183\) |
Totals: | \(247\) | \(141\) | \(90\) | \(478\) |
\(H_0:\) that the variables are independent. Grade level does not have an effect on what a student finds most important
\(H_A:\) There is a dependence on the two variables. Grade level does effect what a student finds most important
To test this we again want to compute a chi-square statistic \(\chi^2\):
\[ \displaystyle \chi^2= \sum \frac{ (\text{observed}- \text{expected})^2}{\text{expected}}\]
\[ \displaystyle \frac{ (\text{row total} \cdot \text{column total})}{\text{table total}}\]
#expected number of 4th graders that think grades are the most important
grades_4 <- (119*247)/478
pop_4 <- (119*141)/478
sport_4 <- (119*90)/478
grades_5 <- (176*247)/478
pop_5 <- (176*141)/478
sport_5 <- (176*90)/478
grades_6 <- (183*247)/478
pop_6 <- (183*141)/478
sport_6 <- (183*90)/478
chisq <- (63-grades_4)^2/grades_4 + (31-pop_4)^2/pop_4 + (25-sport_4)^2/sport_4 +
(88-grades_5)^2/grades_5 + (55-pop_5)^2/pop_5 + (33-sport_5)^2/sport_5 +
(96-grades_6)^2/grades_6 + (55-pop_6)^2/pop_6 + (32-sport_6)^2/sport_6
# for a test of independence the degrees of freedom is (# of rows -1 )* (# of columns - 1) in this case (3 -1 ) * (3 -1) = 4
1-pchisq(chisq,4)
## [1] 0.8593185
# we have a p-value of 85.9%, with any reasonable significance we fail to reject the null hypothesis.
# Our data does not provide statistically significant evidence that grade level effect what students find important.
# we need another way!
df <- data.frame(grades = c(63,88,96), popular = c(31,55,55), sports = c(25,33,32))
chisq.test(df)
##
## Pearson's Chi-squared test
##
## data: df
## X-squared = 1.3121, df = 4, p-value = 0.8593
Pressure to Succeed | High Anxiety | Medium-High Anxiety | Medium Anxiety | Medium-Low Anxiety | Low Anxiety | Total |
---|---|---|---|---|---|---|
High | \(35\) | \(42\) | \(53\) | \(15\) | \(10\) | \(155\) |
Medium | \(18\) | \(48\) | \(63\) | \(33\) | \(31\) | \(193\) |
Low | \(4\) | \(5\) | \(11\) | \(15\) | \(17\) | \(52\) |
Total | \(57\) | \(95\) | \(127\) | \(163\) | \(158\) | \(400\) |
Is there sufficient evidence to conclude that a student’s anxiety level depends on the pressure to succeed?
\(H_0:\)
\(H_A:\)
Monday | Tuesday | Wednesday | Thursday | |
---|---|---|---|---|
number of absences | 15 | 12 | 9 | 9 |
Suppose there are \(60\) absences in an average week. Test the goodness of fit of this data to a uniform distribution with a significance level of \(.05\).