Monday | Tuesday | Wednesday | Thursday | |
---|---|---|---|---|
number of absences | 15 | 12 | 9 | 9 |
Suppose there are \(60\) absences in an average week. Test the goodness of fit of this data to a uniform distribution with a significance level of \(.05\).
Marital Status | Percent |
---|---|
never married | \(31\) |
married | \(56.5\) |
widowed | \(2.5\) |
divorced/separated | \(10\) |
From a random sample of \(400\) mean ages \(18\) to \(24\), the following data is collected:
Marital Status | Count |
---|---|
never married | \(140\) |
married | \(238\) |
widowed | \(2\) |
divorced/separated | \(20\) |
Perform a goodness of fit test with significance level of \(.05\).
Number of Televisions | Percent |
---|---|
\(0\) | \(10\) |
\(1\) | \(16\) |
\(2\) | \(55\) |
\(3\) | \(11\) |
\(4+\) | \(8\) |
A random sample of 600 families in North Carolina gave the following results:
Number of Televisions | Count |
---|---|
\(0\) | \(66\) |
\(1\) | \(119\) |
\(2\) | \(340\) |
\(3\) | \(60\) |
\(4+\) | \(15\) |
At the \(1 \%\) significance level, does it appear that the distribution of number of televisions in North Carolina is different from the distribution for the American population as a whole?
Salary | No HS diploma | HS | College | Masters | ||
---|---|---|---|---|---|---|
\(< \$30,000\) | \(15\) | \(25\) | \(10\) | \(5\) | ||
\(30\)k\(-40\)k | \(20\) | \(40\) | \(70\) | \(30\) | ||
\(40\)k\(-50\)k | \(10\) | \(20\) | \(40\) | \(55\) | ||
\(50\)k\(-60\)k | \(5\) | \(10\) | \(20\) | \(60\) | ||
\(> \$60,000\) | \(0\) | \(5\) | \(10\) | \(150\) |
\(H_0:\)
\(H_A:\)
For the following section we will use data from: “Learning Statistics with R” by Danielle Navarro:
load("./parenthood.Rdata")
load("./pearson_correlations.RData")
load("./effort.Rdata")
Consider the following plots:
ggplot(parenthood, aes(x=dan.sleep, y=dan.grump)) + geom_point()
ggplot(parenthood, aes(x=baby.sleep, y=dan.grump)) + geom_point()
The Pearson correlation coefficient \(r_{XY}\) is a standardized covariance measure:
\[r_{XY} = \frac{1}{N-1} \sum_{i=1}^{N} \frac{X_i-\bar{X}}{s_X}\frac{Y_i-\bar{Y}}{s_Y} \]
where \(s_x\) and \(s_y\) are the sample standard deviations.
The R code for the Pearson correlation coefficient is:
Below is data with various Pearson correlation coefficients:
ggplot(outcomes, aes(x=V1,y=V2))+geom_point() + facet_wrap(~pearson)
Correlation | Strength | Direction |
---|---|---|
\(-1\) to \(-0.9\) | Very Strong | Negative |
\(-0.9\) to \(-0.7\) | Strong | Negative |
\(-0.7\) to \(-0.4\) | Moderate | Negative |
\(-0.4\) to \(-0.2\) | Weak | Negative |
\(-0.2\) to \(0\) | Negligible | Negative |
\(0\) to \(0.2\) | Negligible | Positive |
\(0.2\) to \(0.4\) | Weak | Positive |
\(0.4\) to \(0.7\) | Moderate | Positive |
\(0.7\) to \(0.9\) | Strong | Positive |
\(0.9\) to \(1\) | Very Strong | Positive |