Suppose you are trying to predict the proportion of Davidson students that voted for Kamala Harris.

There is some true proportion, unknown to us, for the sake of this example lets say our best guess is 50%.

If we sample 400 students, what is the the expected value for the number of students voted for Harris? Expect there to be 200. Let’s simulate this:

candidates <- c("H","T")
table(sample(candidates, 100, replace=TRUE))
## 
##  H  T 
## 48 52

how big is my error?

  • I’m two away from my expectation of 50/50
  • Alternatively, I expected to get .5 but I actually got .52, so my proportion was off by .02
table(sample(candidates, 400, replace=TRUE))
## 
##   H   T 
## 194 206

In the larger simulation

  • 11 votes away from expectation, so my error got bigger
  • Expected .5 and got .527
table(sample(candidates, 4000, replace=TRUE))
## 
##    H    T 
## 1945 2055
  • the number votes is off by 17
  • the proportion is off by .00425

As you increase your sample size, the number of votes you are off by grows. But the proportion you are off by decreases. This is the Law of Large Numbers.

  • the amount that you are off by is called the standard error
  • the standard error represents how much you expect your guesses to be wrong by.