A random variable is a numerical outcome to a random process.
eg. finding the sum of two dice, number of red crads drawn when drawing 5 cards, sampling random from a population produces random variables
Most important example for us: measuring a quantitative summary (or characteristic) of a sample. The randomness comes from the possiblility of having selected a different sample.
eg. A student has an assigned text book for a course. How much a random student in the course spends on the text book is a random variable, X.
random variables are usually denoted with capitals X,Y,Z
The book store assumes that students will either: not buy a book, buy the book used, or buy it brand new. The possible outcomes are usually denoted with lower case letters.
Revenue for the book store is x1=0, x2=137, x3=170 for the three possbilities and the book store believes they each occur with probability .20, .55, and .25 respectively. In probability notation:
P(X=0)=.20,P(X=137)=.55,P(X=170)=.25
books_data <- data.frame(prices = c("0","137","170"), probs = c(.2,.55,.25))
ggplot(books_data,aes(x=prices,y=probs)) +
geom_bar(stat="identity") +
ylim(0,1) +
labs(main="Proportion of students buying books at different prices")+
ylab("Proportion")+
xlab("Cost")
Another discrete probability distribution is that of rolling a die:
die <-data.frame(roll=c(1,2,3,4,5,6),probs=c(1/6,1/6,1/6,1/6,1/6,1/6))
ggplot(die,aes(x=roll,y=probs))+geom_bar(stat="identity") +
ylim(0,1) +
scale_x_discrete(name ="Roll",
limits=c("1","2","3","4","5","6"))
Probability Distribution
Add notes here
The expected value of a random variable is:
E(X)=x1×P(X=x1)+⋯+xk×P(X=xk)=k∑i=1xiP(X=xi)
and the standard deviation of the expected value of the random variable is
√∑(xi−E(X))2P(X=xi)
If all the probabilities are the same, then E(X)=x1×1k+⋯+xk×1k=∑ki=1xik=ˉx
this is just the mean.
eg. The expected price of a text book is:
0*(.2)+ 137 * (.55) + 170 *(.25)
## [1] 117.85
# the expected value of a students cost at the book store is $117.85
books_data_2 <- data.frame(prices = c(0,137,170), probs = c(.2,.55,.25))
expected_cost <- sum(books_data_2$prices * books_data_2$probs)
A small example:
# 2, 3, 3, 4, 4, 4
# the old way
(2+3+3+4+4+4)/6
## [1] 3.333333
# The new way
(2*1/6) + (3*2/6)+(4*3/6)
## [1] 3.333333
The standard deviation is:
sqrt(sum((books_data_2$prices - expected_cost)^2*books_data_2$probs))
## [1] 60.49238
# $60.49 is the standard deviation of the expected cost.
eg.
Image source: OpenIntro
Image source: OpenIntro