• Today’s Agenda
  • This creates a section header
  • Data frame basics
  • Working with Penguin data
    • Activities

Today’s Agenda

  • Packages
  • Tables

This creates a section header

  • This is a markdown file, it let’s us take notes and include R code in our notes
  • Reminder, to make an R code chunk we: go to the green C above
# these make comments
3+5
## [1] 8
  • Each plus sign is making a bulleted list
  • This is a test line
    • this is a bulleted list inside a list

Data frame basics

  • Inside the package palmerpenguins is a dataset called penguins
# we are looking at the penguin data
# to load the package we use the command library()
library(palmerpenguins)
penguins
## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
# add the data set to our environment
data_penguins <- penguins
  • penguins is not found because the package is not installed/loaded

Installing Packages

  • we should never install a package in our workspace
  • always install in the console
  • the command is install.packages("") to install
  • the command to LOAD the package is library()

Working with Penguin data

# to get more info about something use ?
?palmerpenguins
# mean(data_penguins)
# didn't work because it is nonsense

# to get a variable from a data set use $
bills_length <- data_penguins$bill_length_mm

# this produces a vector (a single column or row from a data frame/table/matrix)

#mean(bills_length)
# still doesn't work because of NA values

mean(bills_length,na.rm = TRUE)
## [1] 43.92193
# the average bill length of all penguins in the data set is 43.92 mm
# getting a specific value is like playing battleship
data_penguins[5,2]
## # A tibble: 1 × 1
##   island   
##   <fct>    
## 1 Torgersen
data_penguins[7,4]
## # A tibble: 1 × 1
##   bill_depth_mm
##           <dbl>
## 1          17.8
# to get a row
data_penguins[3,]
## # A tibble: 1 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           40.3            18              195        3250 fema…
## # … with 1 more variable: year <int>

Activities

First thing we will do is install a new package. In the CONSOLE, run the appropriate code to install the package fivethirtyeight.

  1. Load the fivethirtyeight package (which you should have just installed), locally
# code to load the package
  1. Inside this package is a dataset called biopics full of data about biographical picture movies. Choose a an appropriate name and save this data frame to your local environment.
# saving data frame to environment
  1. Try using the function head() on the data set. What does the head() function do?
# See head() of biopic data

Type your answer in words here or write your answer as a comment in the code chunk

  1. To read more about a particular data set you can use the help function. Run the code chunk below and explain what the column box_office in the data frame represents.
# putting a ? at the beginning tells you more about the data
?biopics
## No documentation for 'biopics' in specified packages and libraries:
## you could try '??biopics'

Type your answer in words here or write your answer as a comment in the code chunk

  1. How many movies are represented in the data frame? How many characteristics of each movie are recorded?

Type your answer to the question here

  1. Classify the following variables as quantitative (discrete or continuous) or qualitative (nominal or ordinal).
  • number_of_subjects:
  • type_of_subjects:
  • box_office:
  1. Save the number_of_subjects column to your local environment.
# saving number_of_subjects data locally
  1. To see how many times each number of subject appears we can use the table() function.
# use table() on the local variable you made in 7. 
  1. Finally, try to run barplot(table()) on the variable you made in 7.
# use barplot(table()) on the local variable you made in 7.