Suppose we consider a model
\[\hat{Y_i} = b_1 X_i + b_0\]
Our hope is that our estimations are pretty close to the actual values. That is, we hope that:
\[Y_i = b_1 X_i + b_0 + \epsilon\]
We think of the error term as encapsulation some random variation in the response variable.
In a hypothesis test, this is our alternate hypothesis.
The null model will be our null hypothesis. In the null model, there is not a relationship between the explanatory variable and the response variable (so \(b_1=0\)).
\(H_0: Y_i = b_0 + \epsilon\): the null model is better at predicting \(Y_i\) \(H_A: Y_i = b_1 X_i + b_0 + \epsilon\): the alternate model is better at predicting \(Y_i\)
load("~/Documents/GitHub/MAT104-Fall24/Week12/parenthood.Rdata")
dansleep <- parenthood$dan.sleep
babysleep <- parenthood$baby.sleep
grump <- parenthood$dan.grump
set.seed(33)
GPA <- rnorm(100,3,.3)
\(H_0:\) Danielle’s grumpiness is due to random variation \(H_A:\) Danielles’s sleep predicts her grumpiness better than random variation.
# To find the line of best fit we use the lm() function:
summary(lm(grump ~ dansleep))
##
## Call:
## lm(formula = grump ~ dansleep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.025 -2.213 -0.399 2.681 11.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 125.9563 3.0161 41.76 <2e-16 ***
## dansleep -8.9368 0.4285 -20.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.332 on 98 degrees of freedom
## Multiple R-squared: 0.8161, Adjusted R-squared: 0.8142
## F-statistic: 434.9 on 1 and 98 DF, p-value: < 2.2e-16
summary(lm(parenthood$dan.grump~GPA))
##
## Call:
## lm(formula = parenthood$dan.grump ~ GPA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.043 -6.896 -1.833 7.431 27.722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.117 10.172 5.812 7.71e-08 ***
## GPA 1.522 3.354 0.454 0.651
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.09 on 98 degrees of freedom
## Multiple R-squared: 0.002097, Adjusted R-squared: -0.008086
## F-statistic: 0.2059 on 1 and 98 DF, p-value: 0.651
Since we get a p-value of \(.651\), there is not significant evidence that these 100 random student GPAs are good predictors of Danielle’s grumpiness. In fact, we can see this in a scatter plot:
df<-data.frame(grump,GPA)
ggplot(df,aes(x=GPA,y=grump))+geom_point() + geom_abline(intercept = 59.117, slope = 1.522)
\[\hat{Y_i} = -8.936 X_i + 125.9563\]
summary(lm(grump ~ dansleep))
##
## Call:
## lm(formula = grump ~ dansleep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.025 -2.213 -0.399 2.681 11.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 125.9563 3.0161 41.76 <2e-16 ***
## dansleep -8.9368 0.4285 -20.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.332 on 98 degrees of freedom
## Multiple R-squared: 0.8161, Adjusted R-squared: 0.8142
## F-statistic: 434.9 on 1 and 98 DF, p-value: < 2.2e-16
summary(lm(grump ~ babysleep))
##
## Call:
## lm(formula = grump ~ babysleep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.4190 -5.0049 -0.0587 4.9567 23.7275
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 85.7817 3.3528 25.585 < 2e-16 ***
## babysleep -2.7421 0.4035 -6.796 8.45e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.327 on 98 degrees of freedom
## Multiple R-squared: 0.3203, Adjusted R-squared: 0.3134
## F-statistic: 46.18 on 1 and 98 DF, p-value: 8.448e-10
summary(lm(grump ~ dansleep + babysleep))
##
## Call:
## lm(formula = grump ~ dansleep + babysleep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.0345 -2.2198 -0.4016 2.6775 11.7496
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 125.96557 3.04095 41.423 <2e-16 ***
## dansleep -8.95025 0.55346 -16.172 <2e-16 ***
## babysleep 0.01052 0.27106 0.039 0.969
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.354 on 97 degrees of freedom
## Multiple R-squared: 0.8161, Adjusted R-squared: 0.8123
## F-statistic: 215.2 on 2 and 97 DF, p-value: < 2.2e-16
Looking at the multiple regression model, we see that the p-value for specifically the baby’s sleep is really high. So, in the multiple regression model we see that the baby’s sleep does not help predict Danielle’s grumpiness better than just random varaition when Danielle’s sleep is taken into account.
-8.9368 - qt(.975,97)*.4285
## [1] -9.787254
-8.9368 + qt(.975,97)*.4285
## [1] -8.086346
# We are 95% confident that the coefficient for Danielle's sleep is between -9.79 and -8.09