\[\chi^2 = \sum \frac{(\text{observed}-\text{expected})^2}{\text{expected}}\]
Compare and contrast the two questions:
Below is a table describing average amount of time spent on the computer in two groups:
| Avg. Hours per day | Group A | Group B | 
|---|---|---|
| \(<1\) hour | \(30\) | \(26\) | 
| \(1-3\) hours | \(35\) | \(42\) | 
| \(3-5\) hours | \(25\) | \(22\) | 
| \(>5\) hours | \(10\) | \(10\) | 
| Totals | \(100\) | 
\(H_0:\) they are homogeneous ( both data come from the same or similar distributions) \(H_A:\) they are not homogeneous (the data seem to come from different distributions)
#This is a test of homogeneity, we want to know if A and B come from the same distribution.
e_11 <- 56*100/200
e_12 <- 56*100/200
e_21 <- 77*100/200
e_22 <- 77*100/200
e_31 <- 47*100/200
e_32 <- 47*100/200
e_41 <- 20*100/200
e_42 <- 20*100/200
chi <- (30-e_11)^2/e_11 + (26-e_12)^2/e_12 + (35-e_21)^2/e_21 + (42-e_22)^2/e_22 + (25-e_31)^2/e_31 + (22-e_32)^2/e_32 + (10-e_41)^2/e_41 + (10-e_42)^2/e_42
# Our data does not provide statistically significant evidence that the distributions are different from one another.# Goodness of fit test, we want to know if group A matches the expected distribution which is group B
chi <- (30-26)^2/26 + (35-42)^2/42 + (25-22)^2/22 + (10-10)^2/10
# This data does not provide statistically significant evidence that apple users are different than the expected distributionFor the following section we will use data from: “Learning Statistics with R” by Danielle Navarro:
load("~/MAT104-Spring25/Week12/parenthood.Rdata")
load("~/MAT104-Spring25/Week12/pearson_correlations.RData")
load("~/MAT104-Spring25/Week12/effort.Rdata")Consider the following plots:
ggplot(parenthood, aes(x=dan.sleep, y=dan.grump)) + geom_point()ggplot(parenthood, aes(x=baby.sleep, y=dan.grump)) + geom_point()The Pearson correlation coefficient \(r_{XY}\) is a standardized covariance measure:
\[r_{XY} = \frac{1}{N-1} \sum_{i=1}^{N} \frac{X_i-\bar{X}}{s_X}\frac{Y_i-\bar{Y}}{s_Y} \]
where \(s_x\) and \(s_y\) are the sample standard deviations.
The R code for the Pearson correlation coefficient is:
Below is data with various Pearson correlation coefficients:
ggplot(outcomes, aes(x=V1,y=V2))+geom_point() + facet_wrap(~pearson)| Correlation | Strength | Direction | 
|---|---|---|
| \(-1\) to \(-0.9\) | Very Strong | Negative | 
| \(-0.9\) to \(-0.7\) | Strong | Negative | 
| \(-0.7\) to \(-0.4\) | Moderate | Negative | 
| \(-0.4\) to \(-0.2\) | Weak | Negative | 
| \(-0.2\) to \(0\) | Negligible | Negative | 
| \(0\) to \(0.2\) | Negligible | Positive | 
| \(0.2\) to \(0.4\) | Weak | Positive | 
| \(0.4\) to \(0.7\) | Moderate | Positive | 
| \(0.7\) to \(0.9\) | Strong | Positive | 
| \(0.9\) to \(1\) | Very Strong | Positive |