The Central Limit Theorem

  • there is a statistic from the population we want to know about (often average or proportion) population parameter

  • we can take a sample of size \(n\), and measure the statistic for the sample, sample statistic

  • Goal: estimate the population parameter using a sample statistic. (the sample statistic is sometimes referred to as a point estimate)

  • if we can repeatedly sample from our population at a fixed sample size, the sample statistics will be centered around the population parameter

  • the central limit theorem says it will be normally distributed

  • What normal distribution? What is the center? How spread out is it?

Conditions for Using CLT for means

  • normality condition:
    • if your sample size is large enough, then this is met automatically.
    • if your sample size \(n \leq 30\), then your population needs to be approximately normally distributed.
    • rule of thumb: if \(n \leq 30\) there needs to be no significant outliers
  • independence condition:
    • sampling needs to be an independent process
    • we are fine if we use simple random sampling

Image source: OpenIntro

CLT for means

  • \(\mu\) is the population mean
  • \(\bar{X}\) is the sample mean
  • \(\sigma\) is the population standard deviation
  • \(s\) is the sample sample standard deviation
  • \(n\) is sample size
  • the standard deviation of the sampling distribution is called the standard error

\[\bar{X} \sim N \left(\mu,\frac{\sigma}{\sqrt{n}} \right)\]

  • as sample size increases (\(n\) gets large), the standard error gets small, and so, the sampling distribution is more narrow around the population mean.

Example: Suppose that the population mean is \(5\) and the population standard deviation is \(12\). What is the probability that a simple random sample of size \(36\) will have a sample mean greater than \(7\)?

z_score <- (7-5)/(12/sqrt(36))
1-pnorm(1)
## [1] 0.1586553
pnorm(1,lower.tail = FALSE)
## [1] 0.1586553
1-pnorm(7,5,12/sqrt(36))
## [1] 0.1586553
# There is a 15.8% chance that if you take a sample of size 36 from a population with mean 5 and standard deviation 12, that the sample mean will be larger than 7.

Activity:

  1. A population has mean \(\mu = 143\) and standard deviation \(\sigma = 15\). Describe the sampling distribution for a sample of size \(150\). Draw a picture of the sampling distribution. Label the area corresponding to the probability of a sample mean greater than \(144\). Find the probability.

  2. A population has mean \(\mu = 22\) and standard deviation \(\sigma = 1.4\). What is the standard error if the sample size is \(50\). How many standard errors away from the population mean is a sample mean of \(\bar{X} = 23\)? Find the probability that a sample mean of size \(50\) has mean less than 23.

  3. A population has mean \(\mu = 22\) and standard deviation \(\sigma = 1.4\). You plan to take a sample of \(50\) observations. Find a the \(2.5\) percentile and the \(97.5\) percentile of the sampling distribution for the sample mean.

  • to do this problem, you need the z-score corresponding to 97.5 and 2.5 percentile
qnorm(.975)
## [1] 1.959964
qnorm(.025)
## [1] -1.959964