| Monday | Tuesday | Wednesday | Thursday | |
|---|---|---|---|---|
| number of absences | 15 | 12 | 9 | 9 |
Suppose there are \(60\) absences in an average week. Test the goodness of fit of this data to a uniform distribution with a significance level of \(.05\).
| Marital Status | Percent |
|---|---|
| never married | \(31\) |
| married | \(56.5\) |
| widowed | \(2.5\) |
| divorced/separated | \(10\) |
From a random sample of \(400\) mean ages \(18\) to \(24\), the following data is collected:
| Marital Status | Count |
|---|---|
| never married | \(140\) |
| married | \(238\) |
| widowed | \(2\) |
| divorced/separated | \(20\) |
Perform a goodness of fit test with significance level of \(.05\).
| Number of Televisions | Percent |
|---|---|
| \(0\) | \(10\) |
| \(1\) | \(16\) |
| \(2\) | \(55\) |
| \(3\) | \(11\) |
| \(4+\) | \(8\) |
A random sample of 600 families in North Carolina gave the following results:
| Number of Televisions | Count |
|---|---|
| \(0\) | \(66\) |
| \(1\) | \(119\) |
| \(2\) | \(340\) |
| \(3\) | \(60\) |
| \(4+\) | \(15\) |
At the \(1 \%\) significance level, does it appear that the distribution of number of televisions in North Carolina is different from the distribution for the American population as a whole?
| Salary | No HS diploma | HS | College | Masters | ||
|---|---|---|---|---|---|---|
| \(< \$30,000\) | \(15\) | \(25\) | \(10\) | \(5\) | ||
| \(30\)k\(-40\)k | \(20\) | \(40\) | \(70\) | \(30\) | ||
| \(40\)k\(-50\)k | \(10\) | \(20\) | \(40\) | \(55\) | ||
| \(50\)k\(-60\)k | \(5\) | \(10\) | \(20\) | \(60\) | ||
| \(> \$60,000\) | \(0\) | \(5\) | \(10\) | \(150\) |
\(H_0:\)
\(H_A:\)
For the following section we will use data from: “Learning Statistics with R” by Danielle Navarro:
load("./parenthood.Rdata")
load("./pearson_correlations.RData")
load("./effort.Rdata")
Consider the following plots:
ggplot(parenthood, aes(x=dan.sleep, y=dan.grump)) + geom_point()
ggplot(parenthood, aes(x=baby.sleep, y=dan.grump)) + geom_point()
The Pearson correlation coefficient \(r_{XY}\) is a standardized covariance measure:
\[r_{XY} = \frac{1}{N-1} \sum_{i=1}^{N} \frac{X_i-\bar{X}}{s_X}\frac{Y_i-\bar{Y}}{s_Y} \]
where \(s_x\) and \(s_y\) are the sample standard deviations.
The R code for the Pearson correlation coefficient is:
Below is data with various Pearson correlation coefficients:
ggplot(outcomes, aes(x=V1,y=V2))+geom_point() + facet_wrap(~pearson)
| Correlation | Strength | Direction |
|---|---|---|
| \(-1\) to \(-0.9\) | Very Strong | Negative |
| \(-0.9\) to \(-0.7\) | Strong | Negative |
| \(-0.7\) to \(-0.4\) | Moderate | Negative |
| \(-0.4\) to \(-0.2\) | Weak | Negative |
| \(-0.2\) to \(0\) | Negligible | Negative |
| \(0\) to \(0.2\) | Negligible | Positive |
| \(0.2\) to \(0.4\) | Weak | Positive |
| \(0.4\) to \(0.7\) | Moderate | Positive |
| \(0.7\) to \(0.9\) | Strong | Positive |
| \(0.9\) to \(1\) | Very Strong | Positive |