load("./parenthood.Rdata")
load("./pearson_correlations.RData")
load("./effort.Rdata")
The graph below displays the error between our prediction and the true data in red:
# Use our slope and intercept guesses to predict the y-values
predictions <- -12*parenthood$dan.sleep + 144
# Plot segments joining the actual points and the predicted points
ggplot(parenthood, aes(x=dan.sleep,y=dan.grump)) +
geom_point() +
geom_abline(intercept = 144, slope = -12, color="blue") +
geom_segment(aes(xend = dan.sleep, yend = predictions, color = "residual"))
To lengths of these little red segments are called residuals. We can find how big the residuals are by finding the difference between the predicted value and the actual value:
\[\epsilon_i = Y_i - \hat{Y}_i\]
Putting these ideas together, we can see that our real data is equal to our prediction plus the residuals:
\[Y_i = \hat{Y}_i + \epsilon_i = b_1X_i + b_0 + \epsilon_i\]
\[ \sum \epsilon_i^2 = \sum (Y_i - \hat{Y}_i)^2\]
The model we use for two predictors is: \[\hat{Y}_i = b_2 X_{i,2} + b_1 X_{i,1} + b_0\]
where \(X_{i,2}\) is the amount of sleep the baby got on day \(i\) and \(X_{i,1}\) is the amount of sleep Danielle got on day \(i\).
To find the best fitting regression line we use
lm()
:
scatter3d(dan.grump ~ dan.sleep + baby.sleep, parenthood)
rglwidget()
\(\hat{b}_0 =\)
\(\hat{b}_1 =\)
Find the sum of the square residuals for the model in exercise 1.
Use cor()
to find the strength of the correlation
between bill length and body mass.
Use simple linear regression to model the relationship between flipper length and body mass for the penguin data. What values do you get for \(\hat{b}_0\) and \(\hat{b}_1\)? Plot a scatter plot for the data with the line you found.
Use multiple linear regression to model the body mass using the predictors flipper length and bill length for the penguin data. Assuming \(X_1\) is flipper length and \(X_2\) is bill length, what values do you get for \(\hat{b}_0\), \(\hat{b}_1\), and \(\hat{b}_2\)?
# To find the line of best fit we use the lm() function:
ggplot(parenthood, aes(x=dan.sleep,y=dan.grump)) +
geom_point()
So, our model is
# show that (mean(x),mean(y)) is a point on the model
\[b_1 = \frac{s_y}{s_x}R\]
# compute the slope of the model with this shortcut
\(\hat{b}_0 =\)
\(\hat{b}_1 =\)
Use cor()
to find \(R^2\) and compare this to what the linear
model function lm()
says it should be.
Use the formula for estimating \(b_1\) in the penguin flipper length data
and verify that it is the same value given by
lm()
.
Use simple linear regression to model the relationship between flipper length and body mass for the penguin data. What values do you get for \(\hat{b}_0\) and \(\hat{b}_1\)? Plot a scatter plot for the data with the line you found.
Use multiple linear regression to model the body mass using the explanatory variables flipper length and bill length for the penguin data. Assuming \(X_1\) is flipper length and \(X_2\) is bill length, what values do you get for \(\hat{b}_0\), \(\hat{b}_1\), and \(\hat{b}_2\)?