Day26 - Two Proportions

\[\hat{p_1} - \hat{p_2} \sim N \left( p_1- p_2, \sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \right) \]

A survey asked \(827\) randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Below is the distribution of responses, separated based on whether or not the respondent graduated from college. Test whether the data provide statistically significant evidence that the proportion of college grads and non-college grads that don’t have an opinion on the matter are different at a \(5\%\) significance levels.

	College Grad	Not College Grad
Support	\(154\)	\(132\)
Oppose	\(180\)	\(126\)
Don’t know	\(104\)	\(131\)
Total	\(438\)	\(389\)

\(H_0: p_1 - p_2 = 0\) or \(p_1 = p_2\)

\(H_A: p_1 - p_2 \neq 0\) or \(p_1 \neq p_2\)

Since we are assuming that the two proportions are equal, when computing the standard error we need some “best” estimate of the proportion.
this is called a pooled proportion

\[\text{pooled} = \frac{\text{# of successes in first group + seceond group}}{\text{# the total in both groups}}\]

pooled <- (104+131)/(438+389)

se <- sqrt((pooled*(1-pooled)/438) + (pooled*(1-pooled)/389))

sample_diff <- (104/438)- (131/389)

z <- (sample_diff - 0)/se

pnorm(z)*2

## [1] 0.001573334

# Our p-value is .0015
# Our data provides statistically significant evidence that there is a difference in the two proportions

When computing a confidence interval, you no longer need to use a pooled proportion.

Now calculate a \(95 \%\) confidence interval for the difference between the proportion of college grads and non-college grads that don’t have an opinion on the matter.
According to a report on sleep deprivation by the Centers for Disease Control and Prevention, the proportion of California residents who reported insufficient rest or sleep during each of the preceding \(30\) days is \(8.0 \%\), while this proportion is \(8.8 \%\) for Oregon residents. These data are based on simple random samples of \(11,545\) California and \(4,691\) Oregon residents. Determine, at a \(10 \%\) significance level, whether the data provide evidence that the proportions are different.
Calculate a \(95 \%\) confidence interval for the difference between the proportions of Californians and Oregonians who are sleep deprived and interpret it in context of the data.