Today’s Agenda

  • Random Variables
  • Expected Value
  • Uniform Distribution

Random Variable

  • A random variable is a numerical outcome to a random process.

  • eg. finding the sum of two dice, number of red crads drawn when drawing 5 cards, sampling random from a population produces random variables

  • Most important example for us: measuring a quantitative summary (or characteristic) of a sample. The randomness comes from the possiblility of having selected a different sample.

  • eg. A student has an assigned text book for a course. How much a random student in the course spends on the text book is a random variable, \(X\).

  • random variables are usually denoted with capitals \(X\),\(Y\),\(Z\)

  • The book store assumes that students will either: not buy a book, buy the book used, or buy it brand new. The possible outcomes are usually denoted with lower case letters.

  • Revenue for the book store is \(x_1 = 0\), \(x_2 = 137\), \(x_3 = 170\) for the three possbilities and the book store believes they each occur with probability \(.20\), \(.55\), and \(.25\) respectively. In probability notation:

\[P(X=0)=.20,P(X=137)=.55,P(X=170)=.25\]

  • this is called a discrete probability distribution because there are only finitely many options.
books_data <- data.frame(prices = c("0","137","170"), probs = c(.2,.55,.25))
ggplot(books_data,aes(x=prices,y=probs)) + 
  geom_bar(stat="identity") +
  ylim(0,1) +
  labs(main="Proportion of students buying books at different prices")+
  ylab("Proportion")+ 
  xlab("Cost")

Another discrete probability distribution is that of rolling a die:

die <-data.frame(roll=c(1,2,3,4,5,6),probs=c(1/6,1/6,1/6,1/6,1/6,1/6))

ggplot(die,aes(x=roll,y=probs))+geom_bar(stat="identity") +
  ylim(0,1) + 
  scale_x_discrete(name ="Roll", 
                    limits=c("1","2","3","4","5","6"))

  • When all probabilities are equally likely the distribution is called the uniform distribution.
  • This particular uniform distribution is a discrete probability distribution.

Probability Distributions

  • a probability distribution has two components:
    • all the probabilities must sum to \(1\).
    • none of the probabilities can be negative.
  • The expected value of a random variable with a uniform probability distribution is the mean of the possibilities

Probability Distribution

Add notes here

Expected Value

The expected value of a random variable is:

\[E(X) = x_1 \times P(X=x_1) + \dots + x_k \times P(X=x_k) = \sum_{i=1}^k x_i P(X = x_i)\]

and the standard deviation of the expected value of the random variable is

\[\sqrt{\sum (x_i-E(X))^2 P(X = x_i)}\]

If all the probabilities are the same, then \[E(X) = x_1 \times \frac{1}{k} + \dots + x_k \times \frac{1}{k} = \frac{\sum_{i=1}^k x_i}{k} = \bar{x}\]

this is just the mean.

eg. The expected price of a text book is:

0*(.2)+ 137 * (.55) + 170 *(.25)
## [1] 117.85
# the expected value of a students cost at the book store is $117.85

books_data_2 <- data.frame(prices = c(0,137,170), probs = c(.2,.55,.25))

expected_cost <- sum(books_data_2$prices * books_data_2$probs)
  • The expected value is a just a generalization of the mean.
  • The usual calculation for mean assumes every option is equally likely, the expected value does not.

A small example:

# 2, 3, 3, 4, 4, 4
# the old way
(2+3+3+4+4+4)/6
## [1] 3.333333
# The new way
(2*1/6) + (3*2/6)+(4*3/6) 
## [1] 3.333333

The standard deviation is:

sqrt(sum((books_data_2$prices - expected_cost)^2*books_data_2$probs))
## [1] 60.49238
# $60.49 is the standard deviation of the expected cost.

Continuous Distributions

  • When data is numeric continuous, we can often make the bins of a histogram extremely small and still have a very readable graph

eg.

Image source: OpenIntro

  • the biggest difference between continuous and discrete probability distributions is that the y-axis represents a different thing
  • for discrete, its the probability
  • for continuous, its the density, and to find the probability you have to find the area.

Image source: OpenIntro