Today’s Agenda
- Quantiles
- Logic Operators
Quantiles
- The xth quantile (percentile) is a value such that x% of data in a
data set is smaller than the xth quantile and the rest of the data is
larger.
- For example, the 10th quantile is a value such that 10% of data is
smaller and the other 90% is larger.
- There is a special name for the 50th quantile:
median
- \(Q_1\) and \(Q_3\) are the 25th and 75th quantiles
respectively
- The interquartile range is the difference between
the 75th and 25th quantiles. \[IQR = Q_3 -
Q_1\]
# Load the penguin package
library(palmerpenguins)
# save the penguin data locally
df_p <- penguins
# find the median of the body mass
median(df_p$body_mass_g)
## [1] NA
- To remove bad data you have two options
# option 1: tell R to ignore NA when doing the median
median(df_p$body_mass_g, na.rm=TRUE)
## [1] 4050
#option 2: tidy your data before using it
tidy_mass <- na.omit(df_p$body_mass_g)
median(tidy_mass)
## [1] 4050
# the second option is better if you want to use the mass data a lot of times, that way you don't have to type na.rm=TRUE every time you want to use the data
# find the 50th percentile, make sure it is the same as median()
quantile(tidy_mass, .50)
## 50%
## 4050
# find all the quartiles at the same time
quantile(tidy_mass)
## 0% 25% 50% 75% 100%
## 2700 3550 4050 4750 6300
# find other percentiles:
quantile(tidy_mass,.10)
## 10%
## 3300
Logic Operators
- Sometimes you want to section off a portion of your data
# Find all penguins that are from Torgersen island
torgersen <- df_p[df_p$island == "Torgersen" ,]
# Find the body mass of all penguins that are from Torgersen island
torgersen_mass <- df_p[df_p$island == "Torgersen" , "body_mass_g"]
#Find all penguins with a body mass less than or equal to 3500g
skinny <- df_p[df_p$body_mass_g <= 3500,]
#Find all penguins with a body mass less than or equal to 3500g that are also from Torgersen island
torgersen_and_skinny <- df_p[df_p$body_mass_g <= 3500 & df_p$island=="Torgersen",]
#Find all penguins with a body mass less than or equal to 3500g or are from Torgersen island
torgersen_or_skinny <- df_p[df_p$body_mass_g <= 3500 | df_p$island=="Torgersen",]