In this lab you will explore how to compare means across many groups. We have already learned ways to test for the difference between two means. We will learn about a new tool called ANalysis Of VAriance, or ANOVA for short. Before performing ANOVA we must check three criteria:
palmerpenguin
library
and save the penguins
data set locally removing the data
points with no flipper length data. We can assume that the penguin
observations are independent.library(palmerpenguins)
penguins <- filter(penguins, flipper_length_mm != "NA")
filter()
function. The first argument is the data set that you want to filter
from and the second argument is the condition you want to filter
by.filter()
function is part of a package called
dplyr
. This package is include in the very large package
called tidyverse
that we already loaded in line 19.Any ANOVA test we run will always have the same hypotheses:
\(H_0\): the mean is the same across all groups
\(H_A\): at least one mean is different
\(H_0:\)
\(H_A:\)
species
and \(y\)-axis
flipper_length_mm
. Do the data seem approximately normal?
Does it look like there is a significant difference between the median
flipper lengths of the species?# insert code here
filter()
save three new data sets, one for each
species of penguin: Adelie
, Chinstrap
, and
Gentoo
.# insert code here
# insert code here
# insert code here
# insert code here
# insert code here
# insert code here
Since the \(p\)-value is very small (it’s so small that R says it is zero), we reject the notion that there is no difference between the average flipper length for each species. The data support that there is a statistically significant difference between average flipper length across species.
Luckily, we don’t need to go through all of those steps in order to calculate ANOVA in R. The following code will calculate all of the values we calculated above. Run this code and make sure the values above match with the values below.
summary(aov(flipper_length_mm ~ species, data=penguins))
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 52473 26237 594.8 <2e-16 ***
## Residuals 339 14953 44
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mlb_players_18
and filter the data to
only contain outfielders (outfielders have position LF
,
RF
, or CF
).# insert code here
AVG
. Compute the
variance of the three groups (LF, RF, CF). Do the data satisfy the
criteria to perform ANOVA?# insert code here
# insert code here