Suppose \(Z\) is standard normal. That is, \(Z\) is a normal distribution with mean 0 and standard deviation 1, and \(Z\) has density curve looking like this:
R has nice built-in functions to answer two questions:
What is \(P(Z \leq z)\)? In other words, what is the area under the density curve to the left of \(Z = z\)? This is a question of cumulative probability.
What is the value of \(Z\) below which one finds a particular proportion of the density curve? This is a question of finding a quantile.
pnorm()
Use the
pnorm()
function!
The pnorm(z)
function gives the area to the left of
\(z\) under the \(N(0,1)\) density curve.
Example: Determine \(P(Z \leq 1.54)\).
Answer:
pnorm(1.54)
## [1] 0.9382198
pnorm(1.54)
= 0.93821-pnorm(1.54)
= 0.0618Example: Determine \(P(Z > 0.6)\).
Answer:
1-pnorm(0.6)
## [1] 0.2742531
Example: Find \(P(-0.23 < Z < 1.54)\)
Answer:
pnorm(1.54)-pnorm(-0.23)
## [1] 0.5291739
pnorm(1.54)-pnorm(-0.23)
do the trick?qnorm()
Use the
qnorm()
function!
The qnorm(a)
function gives the value \(L\) such that \(P(Z\leq L)=a.\)
Example: Determine the value \(L\) in \(N(0,1)\) that marks the 90th percentile. That is, find \(L\) such that the area to the left of \(L\) under the standard normal bell curve equals 0.9
Answer:
qnorm(0.9)
## [1] 1.281552
Fact: If \(X\) is \(N(\mu,\sigma)\) then \[Z = \frac{X-\mu}{\sigma} \text{ is } N(0,1).\]
Example: Suppose a distribution \(X\) is \(N(10,2)\), and we want to compute \(P(X < 9.1)\).
Solution 1: First convert to \(Z\)-scores: The \(Z\) score for 9.1 in \(N(10,2)\) is \[z
= \frac{9.1-10}{2} = -0.45.\] So \[P(X
< 7)=P(Z < -0.45),\] and \(P(Z
< -0.45)\) equals pnorm(-1.5)
:
pnorm(-0.45)
## [1] 0.3263552
Solution 2 in R: Specify the mean \(\mu\) and standard deviation \(\sigma\) in the pnorm()
command and use the original value of \(X\) - we don’t convert to \(Z\)-scores:
pnorm(9.1, mean=10, sd=2)
## [1] 0.3263552
Example: What value of \(X\) marks the top 1% of the \(N(10,2)\) distribution?
Solution 1: First, find the \(Z\)-score in N(0,1) that marks the top 1% of that distribution. This is the value that has area 0.99 to the left of it:
qnorm(.99)
## [1] 2.326348
Then, find the value \(L\) in the given N(10,2) distribution that has this \(Z\)-score by solving this equation for \(L\):
\[\begin{align*} \frac{L-10}{2}&=2.326 \\ L-10 &= 2\cdot 2.326 \\ L-10 &= 4.652 \\ L &= 10 + 4.652 \\ L &= 14.652 \end{align*}\]
Solution 2 in R: We can specify the mean and
standard deviation for any normal distribution within the
qnorm()
function, without first converting the question to
one about \(Z\)-scores:
qnorm(.99,mean=10,sd=2)
## [1] 14.6527
df <- read.csv("https://mphitchman.com/stats/data/normal_act_data.csv")
# get a sense of the shape of the length and width distributions
ggplot(df)+
geom_histogram(aes(x=width),col="white",fill="chocolate",binwidth=.2,alpha=.4)+
geom_histogram(aes(x=length),col="white",fill="purple",binwidth=.2,alpha=.4)+
xlab("width (brown) and length (purple)")+
ggtitle("Length and Width Distributions")
#create a standardized score column for the length:
df$Z_L = (df$length-mean(df$length))/sd(df$length)
#plot the lengths
ggplot(df)+geom_histogram(aes(x=Z_L),col="white",fill="purple",binwidth=.2,alpha=.4)+
xlab("length standardized scores")+ggtitle("Distribution of length Z-scores")
#create a function for drawing a sample of size 50 from the population of all rectangles.
draw.sample<-function(df,n){
chosen.rows<-sample(1:nrow(df),n)
return(df[chosen.rows,])
}
#draw a sample of size 100!
df.sample<-draw.sample(df,100)
#What colors appeared?
table(df$color)
ggplot(df)+
geom_histogram(aes(length),col="white",fill="purple",binwidth=.2,alpha=.4)