The Standard normal distribution \(N(0,1)\)

Suppose \(Z\) is standard normal. That is, \(Z\) is a normal distribution with mean 0 and standard deviation 1, and \(Z\) has density curve looking like this:

R has nice built-in functions to answer two questions:

  1. What is \(P(Z \leq z)\)? In other words, what is the area under the density curve to the left of \(Z = z\)? This is a question of cumulative probability.

  2. What is the value of \(Z\) below which one finds a particular proportion of the density curve? This is a question of finding a quantile.

Cumulative probabilities with pnorm()

Use the pnorm() function!

The pnorm(z) function gives the area to the left of \(z\) under the \(N(0,1)\) density curve.

Example: Determine \(P(Z \leq 1.54)\).

Answer:

pnorm(1.54)
## [1] 0.9382198

  • pnorm(1.54) = 0.9382
  • Interpretation: 93.82% of the distribution has a value less than \(z = 1.54\).
  • Note: The area to the right of 1.54 is 1-pnorm(1.54) = 0.0618

Example: Determine \(P(Z > 0.6)\).

Answer:

1-pnorm(0.6)
## [1] 0.2742531

Example: Find \(P(-0.23 < Z < 1.54)\)

Answer:

pnorm(1.54)-pnorm(-0.23)
## [1] 0.5291739
  • Why does pnorm(1.54)-pnorm(-0.23) do the trick?
  • Subtracting two “areas to the left” leaves the area in between.
  • So 52.92% of the dist’n is between -0.23 and 1.54.

Quantiles with qnorm()

Use the qnorm() function!

The qnorm(a) function gives the value \(L\) such that \(P(Z\leq L)=a.\)

Example: Determine the value \(L\) in \(N(0,1)\) that marks the 90th percentile. That is, find \(L\) such that the area to the left of \(L\) under the standard normal bell curve equals 0.9

Answer:

qnorm(0.9)
## [1] 1.281552

Other Normal Distributions

Fact: If \(X\) is \(N(\mu,\sigma)\) then \[Z = \frac{X-\mu}{\sigma} \text{ is } N(0,1).\]

Example: Suppose a distribution \(X\) is \(N(10,2)\), and we want to compute \(P(X < 9.1)\).

Solution 1: First convert to \(Z\)-scores: The \(Z\) score for 9.1 in \(N(10,2)\) is \[z = \frac{9.1-10}{2} = -0.45.\] So \[P(X < 7)=P(Z < -0.45),\] and \(P(Z < -0.45)\) equals pnorm(-1.5):

pnorm(-0.45)
## [1] 0.3263552

Solution 2 in R: Specify the mean \(\mu\) and standard deviation \(\sigma\) in the pnorm() command and use the original value of \(X\) - we don’t convert to \(Z\)-scores:

pnorm(9.1, mean=10, sd=2)
## [1] 0.3263552

Example: What value of \(X\) marks the top 1% of the \(N(10,2)\) distribution?

Solution 1: First, find the \(Z\)-score in N(0,1) that marks the top 1% of that distribution. This is the value that has area 0.99 to the left of it:

qnorm(.99)
## [1] 2.326348

Then, find the value \(L\) in the given N(10,2) distribution that has this \(Z\)-score by solving this equation for \(L\):

\[\begin{align*} \frac{L-10}{2}&=2.326 \\ L-10 &= 2\cdot 2.326 \\ L-10 &= 4.652 \\ L &= 10 + 4.652 \\ L &= 14.652 \end{align*}\]

Solution 2 in R: We can specify the mean and standard deviation for any normal distribution within the qnorm() function, without first converting the question to one about \(Z\)-scores:

qnorm(.99,mean=10,sd=2)
## [1] 14.6527

Postscript

Code for a standardized scores example

df <- read.csv("https://mphitchman.com/stats/data/normal_act_data.csv")
# get a sense of the shape of the length and width distributions
ggplot(df)+
  geom_histogram(aes(x=width),col="white",fill="chocolate",binwidth=.2,alpha=.4)+
  geom_histogram(aes(x=length),col="white",fill="purple",binwidth=.2,alpha=.4)+
  xlab("width (brown) and length (purple)")+
  ggtitle("Length and Width Distributions")

#create a standardized score column for the length:
df$Z_L = (df$length-mean(df$length))/sd(df$length)

#plot the lengths
ggplot(df)+geom_histogram(aes(x=Z_L),col="white",fill="purple",binwidth=.2,alpha=.4)+
  xlab("length standardized scores")+ggtitle("Distribution of length Z-scores")

#create a function for drawing a sample of size 50 from the population of all rectangles.
draw.sample<-function(df,n){
  chosen.rows<-sample(1:nrow(df),n)
  return(df[chosen.rows,])
}

#draw a sample of size 100!
df.sample<-draw.sample(df,100)

#What colors appeared?
table(df$color)

ggplot(df)+
  geom_histogram(aes(length),col="white",fill="purple",binwidth=.2,alpha=.4)