\[\chi^2 = \sum \frac{(\text{observed}-\text{expected})^2}{\text{expected}}\]
Compare and contrast the two questions:
Below is a table describing average amount of time spent on the computer in two groups:
Avg. Hours per day | Group A | Group B |
---|---|---|
\(<1\) hour | \(30\) | \(26\) |
\(1-3\) hours | \(35\) | \(42\) |
\(3-5\) hours | \(25\) | \(22\) |
\(>5\) hours | \(10\) | \(10\) |
Totals | \(100\) |
\(H_0:\) they are homogeneous ( both data come from the same or similar distributions) \(H_A:\) they are not homogeneous (the data seem to come from different distributions)
#This is a test of homogeneity, we want to know if A and B come from the same distribution.
e_11 <- 56*100/200
e_12 <- 56*100/200
e_21 <- 77*100/200
e_22 <- 77*100/200
e_31 <- 47*100/200
e_32 <- 47*100/200
e_41 <- 20*100/200
e_42 <- 20*100/200
chi <- (30-e_11)^2/e_11 + (26-e_12)^2/e_12 + (35-e_21)^2/e_21 + (42-e_22)^2/e_22 + (25-e_31)^2/e_31 + (22-e_32)^2/e_32 + (10-e_41)^2/e_41 + (10-e_42)^2/e_42
# Our data does not provide statistically significant evidence that the distributions are different from one another.
# Goodness of fit test, we want to know if group A matches the expected distribution which is group B
chi <- (30-26)^2/26 + (35-42)^2/42 + (25-22)^2/22 + (10-10)^2/10
# This data does not provide statistically significant evidence that apple users are different than the expected distribution
For the following section we will use data from: “Learning Statistics with R” by Danielle Navarro:
load("~/MAT104-Spring25/Week12/parenthood.Rdata")
load("~/MAT104-Spring25/Week12/pearson_correlations.RData")
load("~/MAT104-Spring25/Week12/effort.Rdata")
Consider the following plots:
ggplot(parenthood, aes(x=dan.sleep, y=dan.grump)) + geom_point()
ggplot(parenthood, aes(x=baby.sleep, y=dan.grump)) + geom_point()
The Pearson correlation coefficient \(r_{XY}\) is a standardized covariance measure:
\[r_{XY} = \frac{1}{N-1} \sum_{i=1}^{N} \frac{X_i-\bar{X}}{s_X}\frac{Y_i-\bar{Y}}{s_Y} \]
where \(s_x\) and \(s_y\) are the sample standard deviations.
The R code for the Pearson correlation coefficient is:
Below is data with various Pearson correlation coefficients:
ggplot(outcomes, aes(x=V1,y=V2))+geom_point() + facet_wrap(~pearson)
Correlation | Strength | Direction |
---|---|---|
\(-1\) to \(-0.9\) | Very Strong | Negative |
\(-0.9\) to \(-0.7\) | Strong | Negative |
\(-0.7\) to \(-0.4\) | Moderate | Negative |
\(-0.4\) to \(-0.2\) | Weak | Negative |
\(-0.2\) to \(0\) | Negligible | Negative |
\(0\) to \(0.2\) | Negligible | Positive |
\(0.2\) to \(0.4\) | Weak | Positive |
\(0.4\) to \(0.7\) | Moderate | Positive |
\(0.7\) to \(0.9\) | Strong | Positive |
\(0.9\) to \(1\) | Very Strong | Positive |