Today’s Agenda

  • Quantiles
  • Logic Operators

Quantiles

  • The xth quantile (percentile) is a value such that x% of data in a data set is smaller than the xth quantile and the rest of the data is larger.
  • For example, the 10th quantile is a value such that 10% of data is smaller and the other 90% is larger.
  • There is a special name for the 50th quantile: median
  • \(Q_1\) and \(Q_3\) are the 25th and 75th quantiles respectively
  • The interquartile range is the difference between the 75th and 25th quantiles. \[IQR = Q_3 - Q_1\]
# Load the penguin package
library(palmerpenguins)

# save the penguin data locally
df_p <- penguins

# find the median of the body mass

median(df_p$body_mass_g)
## [1] NA
  • To remove bad data you have two options
# option 1: tell R to ignore NA when doing the median
median(df_p$body_mass_g, na.rm=TRUE)
## [1] 4050
#option 2: tidy your data before using it

tidy_mass <- na.omit(df_p$body_mass_g)
median(tidy_mass)
## [1] 4050
# the second option is better if you want to use the mass data a lot of times, that way you don't have to type na.rm=TRUE every time you want to use the data
# find the 50th percentile, make sure it is the same as median()
quantile(tidy_mass, .50)
##  50% 
## 4050
# find all the quartiles at the same time
quantile(tidy_mass)
##   0%  25%  50%  75% 100% 
## 2700 3550 4050 4750 6300
# find other percentiles:

quantile(tidy_mass,.10)
##  10% 
## 3300

Logic Operators

  • Sometimes you want to section off a portion of your data
# Find all penguins that are from Torgersen island
torgersen <- df_p[df_p$island == "Torgersen" ,]

# Find the body mass of all penguins that are from Torgersen island
torgersen_mass <- df_p[df_p$island == "Torgersen" , "body_mass_g"]

#Find all penguins with a body mass less than or equal to 3500g
skinny <- df_p[df_p$body_mass_g <= 3500,]

#Find all penguins with a body mass less than or equal to 3500g that are also from Torgersen island

torgersen_and_skinny <- df_p[df_p$body_mass_g <= 3500 & df_p$island=="Torgersen",]

#Find all penguins with a body mass less than or equal to 3500g or are from Torgersen island

torgersen_or_skinny <- df_p[df_p$body_mass_g <= 3500 | df_p$island=="Torgersen",]