Day32–ChiSquareIII : Homogeneity

Today’s Agenda

Using Chi-Square table

Employers want to know which days of the week employees are absent in a five-day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of $60$ managers were asked on which day of the week they had the highest number of employee absences. The results were distributed as:

	Monday	Tuesday	Wednesday	Thursday
number of absences	15	12	9	9

Suppose there are $60$ absences in an average week. Test the goodness of fit of this data to a uniform distribution with a significance level of $.05$.

Suppose it is known that the distribution of males age $18$ to $24$ in the U.S. population is as follows:

Marital Status	Percent
never married	$31$
married	$56.5$
widowed	$2.5$
divorced/separated	$10$

From a random sample of $400$ mean ages $18$ to $24$, the following data is collected:

Marital Status	Count
never married	$140$
married	$238$
widowed	$2$
divorced/separated	$20$

Perform a goodness of fit test with significance level of $.05$.

One study indicates that the number of televisions that American families have is distributed (this is the given distribution for the American population):

Number of Televisions	Percent
$0$	$10$
$1$	$16$
$2$	$55$
$3$	$11$
$4+$	$8$

A random sample of 600 families in North Carolina gave the following results:

Number of Televisions	Count
$0$	$66$
$1$	$119$
$2$	$340$
$3$	$60$
$4+$	$15$

At the $1 \%$ significance level, does it appear that the distribution of number of televisions in North Carolina is different from the distribution for the American population as a whole?

Suppose that $600$ thirty-year-olds were surveyed to determine whether or not there is a relationship between the highest education completed and salary. Conduct a test of independence.

Salary	No HS diploma	HS	College	Masters
$< \$30,000$	$15$	$25$	$10$	$5$
$30$k$-40$k	$20$	$40$	$70$	$30$
$40$k$-50$k	$10$	$20$	$40$	$55$
$50$k$-60$k	$5$	$10$	$20$	$60$
$> \$60,000$	$0$	$5$	$10$	$150$

$H_0:$

$H_A:$

Correlation

For the following section we will use data from: “Learning Statistics with R” by Danielle Navarro:

load("./parenthood.Rdata")
load("./pearson_correlations.RData")
load("./effort.Rdata")

This data captures how grumpy Danielle is, how much she slept in a day, and how much her baby slept in a day.

Consider the following plots:

ggplot(parenthood, aes(x=dan.sleep, y=dan.grump)) + geom_point()

ggplot(parenthood, aes(x=baby.sleep, y=dan.grump)) + geom_point()

Correlation coefficient

The Pearson correlation coefficient $r_{XY}$ is a standardized covariance measure:

\[r_{XY} = \frac{1}{N-1} \sum_{i=1}^{N} \frac{X_i-\bar{X}}{s_X}\frac{Y_i-\bar{Y}}{s_Y} \]

where $s_x$ and $s_y$ are the sample standard deviations.

The R code for the Pearson correlation coefficient is:

Properties of $r_{XY}$

Interpreting the Pearson correlation coefficient

Below is data with various Pearson correlation coefficients:

ggplot(outcomes, aes(x=V1,y=V2))+geom_point() + facet_wrap(~pearson)

Exactly what constitutes as a strong correlation depends on the context.
You can, however, use these general guidlines:

Correlation	Strength	Direction
$-1$ to $-0.9$	Very Strong	Negative
$-0.9$ to $-0.7$	Strong	Negative
$-0.7$ to $-0.4$	Moderate	Negative
$-0.4$ to $-0.2$	Weak	Negative
$-0.2$ to $0$	Negligible	Negative
$0$ to $0.2$	Negligible	Positive
$0.2$ to $0.4$	Weak	Positive
$0.4$ to $0.7$	Moderate	Positive
$0.7$ to $0.9$	Strong	Positive
$0.9$ to $1$	Very Strong	Positive

Number of Televisions	Percent
\(0\)	\(10\)
\(1\)	\(16\)
\(2\)	\(55\)
\(3\)	\(11\)
\(4+\)	\(8\)

Number of Televisions	Count
\(0\)	\(66\)
\(1\)	\(119\)
\(2\)	\(340\)
\(3\)	\(60\)
\(4+\)	\(15\)

Salary	No HS diploma	HS	College	Masters
\(< \$30,000\)	\(15\)	\(25\)	\(10\)	\(5\)
\(30\)k\(-40\)k	\(20\)	\(40\)	\(70\)	\(30\)
\(40\)k\(-50\)k	\(10\)	\(20\)	\(40\)	\(55\)
\(50\)k\(-60\)k	\(5\)	\(10\)	\(20\)	\(60\)
\(> \$60,000\)	\(0\)	\(5\)	\(10\)	\(150\)

Marital Status	Percent
never married	\(31\)
married	\(56.5\)
widowed	\(2.5\)
divorced/separated	\(10\)

Marital Status	Count
never married	\(140\)
married	\(238\)
widowed	\(2\)
divorced/separated	\(20\)

Correlation	Strength	Direction
\(-1\) to \(-0.9\)	Very Strong	Negative
\(-0.9\) to \(-0.7\)	Strong	Negative
\(-0.7\) to \(-0.4\)	Moderate	Negative
\(-0.4\) to \(-0.2\)	Weak	Negative
\(-0.2\) to \(0\)	Negligible	Negative
\(0\) to \(0.2\)	Negligible	Positive
\(0.2\) to \(0.4\)	Weak	Positive
\(0.4\) to \(0.7\)	Moderate	Positive
\(0.7\) to \(0.9\)	Strong	Positive
\(0.9\) to \(1\)	Very Strong	Positive

Day32–ChiSquareIII : Homogeneity

Today’s Agenda

Correlation

Correlation coefficient

Properties of \(r_{XY}\)

Interpreting the Pearson correlation coefficient