7 Important Discrete Random Variables
In this chapter we introduce the following well-known discrete random variables: binomial, geometric, Poisson, negative binomial, and hypergeometric. In examples we work through, it will from time to time be convenient to compute probabilities in R. Appendix C contains details about the commands in R useful for doing so.
7.1 Binomial Distributions
It all begins with a Bernoulli trial.
Definition 7.1 A Bernoulli trial is a chance experiment with two distinct possible outcomes, “success” and “failure”. Typically, we let \(p\) denote the probability of success, and \(q\) denote the probability of failure (where \(q = 1 - p\)).
Some examples:
- Roll a 6-sided die and define success to be “roll a 4”, and failure to be “don’t roll a 4”. Here \(p = 1/6\) and \(q = 5/6\).
- Pick a name out of a hat with \(n\) names. Success: pick Oriana’s name; Failure: do not pick Oriana’s name. Here \(p = 1/n\) and \(q = (n-1)/n\).
- Test a person for a particular disease. In medical tests such as these, “success” is often used to describe a positive test (meaning the person tests positive for the disease), and “failure” means the person tests negative for the disease.
Definition 7.2 (Binomial Distribution) Define the random variable \(X\) to equal the number of successes in \(n\) independent, identical Bernoulli trials with probability of success on any given trial equal to \(p\). Then \(X\) is called a binomial distribution with parameters \(n\) and \(p\), and \(X\) is denoted \(\texttt{binom}(n,p)\).
The space of \(X\) equals \(\{0,1,\ldots,n\},\) and for \(x = 0,1, \ldots, n,\) \[p(x) = \binom{n}{x}p^x(1-p)^{n-x}.\]
Does this probability distribution make sense? To find the probability of exactly \(x\) successes in \(n\) independent Bernoulli trials, we first choose which \(x\) of the \(n\) trials will be successes (and we have \(\binom{n}{x}\) choices here). Then, we know the probability of success on each of these \(x\) trials is \(p,\) and the probability of failure on each of the other \(n-x\) trials is \(q = 1-p\). Since the trials are independent, the probability of getting exactly \(x\) successes and \(n-x\) failures is the product \(\binom{n}{x}p^x(1-p)^{n-x}.\)
Furthermore, by the Binomial Theorem (4.1), \[\sum_{x = 0}^n p(x) = (p+(1-p))^n = 1.\]
Theorem 7.1 If \(X\) is \(\texttt{binom}(n,p),\) \[E(X) = np~~~\text{ and }~~~ V(X) = np(1-p).\]
Proof. From the definition of expected value,
\[\begin{align*} E(X) &= \sum_{x=0}^n x \cdot p(x) \\ &= \sum_{x=1}^n x \cdot p(x) & \text{since the } x = 0 \text{ term is } 0 \\ &= \sum_{x=1}^n x \binom{n}{x} p^x (1-p)^{n-x} \\ &= \sum_{x=1}^n x \frac{n!}{(n-x)!x!} p^x (1-p)^{n-x} \\ &= \sum_{x=1}^n \frac{n!}{(n-x)!(x-1)!} p^x (1-p)^{n-x} &\text{cancelling an } x\\ &= np \sum_{x=1}^n \frac{(n-1)!}{(n-x)!(x-1)!}p^{x-1}(1-p)^{n-x} &\text{ pull out }n\text{ from }n!\text{ and one }p\\ &= np \sum_{x=1}^n \binom{n-1}{x-1}p^{x-1}(1-p)^{(n-1)-(x-1)} \end{align*}\]
Hey! The summation term equals 1 since it is the sum of all the probabilities in a \(\texttt{binom}(n-1,p)\) distrtibution!
Thus \(E(X) = np\).
To prove the \(V(X)\) formula, it is helpful to first observe the following
\[E(X(X-1)) = E(X^2-X) = E(X^2)-E(X),\] so \[E(X^2) = E(X(X-1))+E(X).\] We computed \(E(X)\) above, and \(E(X(X-1))\) can be determined similarly:
\[\begin{align*} E(X(X-1)) &= \sum_{x=0}^n x(x-1)\binom{n}{x}p^x(1-p)^{n-x} \\ &= \sum_{x=2}^n x(x-1)\binom{n}{x}p^x(1-p)^{n-x} \\ &= \sum_{x=2}^n x(x-1)\frac{n!}{(n-x)!x!}p^x(1-p)^{n-x} \\ &= \sum_{x=2}^n \frac{n!}{(n-x)!(x-2)!}p^x(1-p)^{n-x} \\ &= n(n-1)p^2 \sum_{x=2}^n \frac{(n-2)!}{(n-x)!(x-2)!}p^{x-2}(1-p)^{n-x} \\ &= n(n-1)p^2 \sum_{x=2}^n\binom{n-2}{x-2}p^{x-2}(1-p)^{(n-2)-(x-2)} \end{align*}\]
This summation also equals one, since it is the sum of all the probabilities in a \(\texttt{binom}(n-2,p)\) distribution, so \(E(X(X-1)) = n(n-1)p^2,\) and it follows that \[E(X^2) = E(X(X-1))+E(X) = n(n-1)p^2 + np.\]
Finally, \[\begin{align*} V(X) &= E(X^2) - E(X)^2 \\ &= n(n-1)p^2 + np - (np)^2 \\ &= n^2p^2 - np^2 + np - n^2p^2 \\ &= np(1-p) \end{align*}\]
Example 7.1 (Guessing on a multiple choice test) A multiple choice exam has 15 questions. Each question has 4 possible answers. If a student answers each question with a random guess, what is the probability they will score 10 or higher?
Let \(X\) = the score on the test. Then, \(X\) is \(\texttt{binom}(n=15,p=1/4)\).
Now, \[P(X \geq 10) = \sum_{x = 10}^{15} \binom{15}{x}\left(\frac{1}{4}\right)^x\left(\frac{3}{4}\right)^{15-x} \approx .0008.\]
Calculating this sum in R
R has a nice command for calculating cumulative probabilities of the form \(P(X \leq x)\).If \(X\) is \(\texttt{binom}(n,p),\) then \(P(X \leq x)\) is calculated in R by pbinom(x,n,p)
. So, in the case of the multiple choice test, \(P(X \geq 10) = 1 - P(X \leq 9) =\) 1-pbinom(9,15,1/4)
. See Appendix C) for more information on using R to calculate probabilities for the important discrete distributions we encounter in this class.
Back to the test, they have less than a 1 in a thousand chance of scoring a 10 or better if they are truly guessing. Note also that the mean for \(X\) is \(E(X) = 15 \cdot \frac{1}{4} = 3.75\).
What if they can eliminate one of the choices on each problem, and randomly guess between the remaining three choices on each problem? Are they likely to do better? If we let \(Y\) denote the score on the test following this approach, then \(Y\) is \(\texttt{binom}(n=15,p=1/3),\) and the probability of scoring 10 or greater ends up about 10 times better than it was before, but still miniscule:
\[P(Y \geq 10) = \sum_{y = 10}^{15} \binom{15}{y}\left(\frac{1}{3}\right)^y\left(\frac{2}{3}\right)^{15-y} \approx .0085,\] The mean score is now \(E(Y) = 15\cdot\frac{1}{3} = 5.\) Ok, not great, but it is better, I guess.
Example 7.2 (Pay the meter?) This example is adapted from an exercise in the Grinstead-Snell text. Flint never puts money in a 25-cent parking meter downtown. He assumes that there is a probability of .03 that he will be caught. The first offense costs nothing, the second costs 10 dollars, and subsequent offenses cost 25 dollars each.
How does the expected cost of parking 100 times without paying the meter compare with the cost of paying the meter each time?
Assume parking events are independent, identical Bernoulli trials with probability \(p = .03\) of getting a ticket. Then the random variable \(X\) counting the number of tickets in 100 trials is \(\texttt{binom}(100,.03),\) and we note that \(E(X) = 100\cdot(.03) = 3\).
In deciding whether to pay the meter, one idea is to consider the cost associated to the expected number of tickets, which would be $35. This amount is higher than the $25 Flint would pay by chucking in a quarter each time. But this approach doesn’t give the full picture.
Instead, let’s determine the expected cost associated to parking 100 times without paying the meter. If Flint never pays the meter, the parking cost \(C\) of these 100 trials is the following function of \(X\):
\[ C(x)= \begin{cases} 0 &\text{ if }x = 0,1 \\ 10 &\text{ if } x = 2 \\ 10+25(x-2) &\text{ if } x \geq 3 \end{cases} \]
Then the expected cost associated with these 100 trials, \(E(C),\) is \[\begin{align*} E(C) &= \sum_{x=0}^{100} C(x)\cdot p(x)\\ &= C(2)\cdot p(2) + \sum_{x=3}^{100} (10 + 25\cdot(x-2))\cdot p(x) \\ &= 10\cdot p(2) + \sum_{x=3}^{100} (25x - 40)\cdot p(x) \\ &= 10\cdot p(2) + \left(E(25X-40) - \sum_{x=0}^{2} (25x - 40)\cdot p(x)\right)\\ &= 10\cdot p(2) + [25\cdot E(X) - 40] - \left(-40\cdot p(0) -15 \cdot p(1)+ 10*p(2)\right)\\ &= [25 E(X) - 40] + 40 \cdot p(0) + 15 \cdot p(1) \\ &= [(25 \cdot 3) - 40] + 40\cdot(.97)^{100} + 15\cdot 100(.03)(.97)^{99} \\ &\approx 39.10. \end{align*}\]
Yes, Flint is better off putting a quarter in the meter each time for a cost of $25 in parking. But never tell Flint the odds.
Example 7.3 (Drilling for Oil) An oil exploration firm in the 1970s wants to finance 10 drilling explorations. They figure each exploration has a probability of success (finding oil) equal to 0.1, and that the 10 operations are independent (success in one is independent of success in any other). The company has $50,000 in fixed costs prior to doing its first exploration, and anticipates a cost of $150,000 for each unsuccessful exploration, and a cost of $300,000 for each successful exploration. Find the expected total cost to the firm for its 10 explorations.
Let \(X\) = number of successful explorations. Then \(X\) is \(\texttt{binom}(10,.1),\) and \(E(X) = 10 \cdot .1 = 1.\) The cost \(C\) (in thousands of dollars) can be expressed as a linear function of \(X\):
\[\begin{align*} C &= 50 + 150(10-X)+300X\\ &= 1550 + 150X. \end{align*}\]
It follows from properties of expected value that \[\begin{align*} E(C) &= E(1550 + 150X)\\ &= E(1550) + 150E(X)\\ &= 1550 + 150 \cdot 1 \\ &= 1700. \end{align*}\] So the expected cost is $1.7 million dollars.
Example 7.4 (AI Generated Example) In a galaxy not so far away, there is a soccer ball factory run by enthusiastic, if somewhat clumsy, aliens. The factory manager, Zorg, claims that only 5% of the soccer balls they produce end up being “special” (read: defective). The company’s quality control inspector, an alien named Blurp, is highly skeptical and decides to randomly select 20 soccer balls from a day’s production to test this claim.
Questions:
- If Zorg’s claim is correct, what is the probability that exactly 2 of the 20 selected soccer balls are “special”?
- If Zorg’s claim is correct, what is the probability that at most 1 of the 20 selected soccer balls is “special”?
- If Zorg’s claim is correct, what is the probability that more than 3 of the 20 selected soccer balls are “special”?
Solution:
Let \(X\) denote the number of “special” soccer balls in the randomly selected set of 20 balls. Then, \(X\) is \(\texttt{binom}(n=20,p=.05)\)
The first question asks for \(P(X = 2)\) which is \[P(X = 2) = \binom{20}{2} p^2 \cdot (1-p)^{18} \approx .189.\]
The second question asks for \[P(X \leq 1) = \sum_{x = 0}^1 \binom{20}{x}p^x(1-p)^{20-x},\]
and using R, we see that \(P(X \leq 1)\) = pbinom(1,20,.05)
\(\approx\) 0.736.
The third question asks for \[P(X > 3) = 1 - P(X \leq 3),\]
which we can evaluate in R with
1-pbinom(3,20,.05)
\(\approx\) 0.016.
Example 7.5 Suppose \(X\) is \(\texttt{binom}(60,1/4)\). Then \(\mu = 60\cdot\frac{1}{4} = 15\) and \(\sigma^2 = 60\cdot \frac{1}{4} \cdot \frac{3}{4} = 11.25.\) Tchebbysheff says at least 75% of the distribution is within 2 standard deviations of the mean, so in this case, at least 75% of the distribution is between \(15 - 2\cdot \sqrt{11.25}\) and \(15 + 2\cdot \sqrt{11.25},\) or between 8.3 and 21.7. The actual percentage is closer to 95%, and it can be found by summing all \(p(x)\) for \(x\) between 8.29 and 21.7. This sum is calculated in R by
## [1] 0.9489842
7.2 Geometric Distributions
Suppose we have a sequence of Bernoulli trials, independent, with probability of success \(p\) on each trial. Let \(X\) equal the number of trials up to and including the trial of the first success. Then \(X\) is called a geometric distribution with parameter \(p\), denoted \(\texttt{geom}(p)\).
The probability function for \(X\) is given by \[p(x) = q^{x-1}\cdot p,\] for \(x = 1, 2, 3, \ldots,\) where, again, \(q = 1-p\).
Note that \[\sum_{x=0}^\infty p(x) = p + pq + pq^2 + \cdots\] is a geometric series with \(|q| < 1\) so it converges. Moreover, by the geometric series formula, \[\sum_{n=0}^\infty pq^n = \frac{p}{1-q}.\] Thus, all the \(p(x)\) sum to 1, and we have a valid probability function.
More generally, for any non-negative integer \(k,\) \[\begin{align*} P(X > k) &= p(X=k+1)+p(X=k+2)+p(X=k+3)+\cdots \\ &= pq^k + pq^{k+1}+pq^{k+2}+\cdots \\ &= pq^k(1 + q + q^2 + \cdots)\\ &= pq^k\cdot\frac{1}{1-q}\\ &= q^k. \end{align*}\]
The geometric distribution can be useful when modeling the behavior of lines. For instance, suppose a line of cars waits to pay their parking fee as they exit the airport. It can be reasonable to assume that over a short interval of time (say 10 seconds), the probability that a car arrives is \(p,\) and the probability that a car does not arrive is \(q = 1-p.\) Then the time \(T\) until the next arrival has a geometric distribution, and by the remark above, the probability that no car arrives in the next \(k\) time units is \(P(T > k) = q^k\).
Theorem 7.2 If \(X\) is \(\texttt{geom}(p)\) for \(0 < p \leq 1,\) then \[E(X) = \frac{1}{p}~~~\text{ and }~~~ V(X) = \frac{1-p}{p^2}.\]
Before proving this theorem we consider the geometric series one more time.
Geometric Series Intermission
From Calc II we know that the geometric series \(\sum q^x\) converges if and only if \(-1 < q < 1,\) and that \[\sum_{x = 0}^\infty q^x=\frac{1}{1-q} \tag{provided |q|<1}\] Thinking of \(q\) as a variable, we can differentiate each side with respect to \(q,\) and the resulting infinite series will still converge for \(-1 < q < 1\).
\[\begin{align*} \frac{d}{dq}\left[\sum_{x = 0}^\infty q^x\right] &= \frac{d}{dq}\left[\frac{1}{1-q}\right] \\ \frac{d}{dq}\left[1+q+q^2 + q^3 + \cdots \right] &= \frac{d}{dq}\left[\frac{1}{1-q}\right] \\ \left[0 + 1 + 2q + 3q^2 + \cdots\right] &= \frac{1}{(1-q)^2}\\ \sum_{x = 1}^\infty x q^{x-1} &= \frac{1}{(1-q)^2}. \end{align*}\]
This ends the geometric series intermission.
Proof. By definition of expected value, \[\begin{align*} E(X) &= \sum_{x=1}^\infty x \cdot q^{x-1}\cdot p \\ &= p \sum_{x=1}^\infty x \cdot q^{x-1}. \end{align*}\] But this series is exactly the one for which we found a formula in the intermission above, so \[E(X) = p \cdot \frac{1}{(1-q)^2} = \frac{1}{p}.\]
We leave the proof that \(V(X) = \frac{1-p}{p^2}\) as an exercise.
Example 7.6 (Message Received)
Assume that, during each second, my junk box receives one email with probability .01 and no email with probability .99. Determine the probability that I will not receive a junk email in the next 10 minutes.
We let \(X\) count how many seconds (from now) it takes to be sent an email to my junk box, and we assume \(X\) is \(\texttt{geom}(p = .01)\).
Then the probability of not receiving a junk email in the next ten minutes is \[P(X > 600) = q^{600} \approx .0024.\] So you’re saying there’s a chance!
We also note that \(E(X) = 1/p = 100,\) the average time to get our next junk mail is 100 seconds.
7.3 Negative Binomial Distribution
Suppose we have a sequence of independent Bernoulli trials, each having probability of success \(p,\) and we want to know how many trials it takes to get our \(r\)th success, where \(r \geq 1\).
For instance, if we set \(r = 3,\) then \(X = 7\) for the following sequence of Bernoulli trials (\(S\) stands for success, \(F\) for failure) \[F F F S F S S F S F F S \ldots \] since the third \(S\) occurs on the 7th trial.
Let \(X\) equal the number of trials up to and including the trial of the \(r\)th success. Then \(X\) is called a negative binomial distribution with parameters \(r\) and \(p\), denoted \(\texttt{nbinom}(r,p)\).
The space of \(X\) is \(\{r, r+1, r+2, \ldots\}\) and for \(x\) in the space of \(X\) the probability function for \(X\) is given by \[p(x) = \binom{x-1}{r-1}p^r(1-p)^{x-r}.\]
Note, that if \(r=1\) we just have the friendly geometric distribution.
Theorem 7.3 If \(X\) is \(\texttt{nbinom}(r,p)\) where \(0 < p \leq 1\) then \[E(X) = \frac{r}{p} ~~~ \text{ and } ~~~ V(X) = \frac{r(1-p)}{p^2}.\]
Example 7.7
If we roll 2 dice and record the sum, how many rolls, on average, will it take to get our 4th 8?
Well, the probability of rolling a sum of 8 is \(p = 5/36\) by our 6x6 dice grid in Example 3.3, and the random variable \(X\) counting the number of rolls until we get our 4th 8 is \(\texttt{nbinom}(r=4,p=5/36),\) so \[E(X) = \frac{4}{5/36} = 28.8.\] Would you want to make a bet that it will take me more than 25 rolls to get my 4th 8?
We note that \[V(X) = \frac{4\cdot (31/36)}{(5/36)^2} \approx 178.6,\] so the standard deviation is \(\sigma = \sqrt{V(X)} \approx 13.36\).
7.4 Hypergeometric Distribution
Here’s the scene: We have a finite population with \(N\) total elements, and this population can be partitioned into two distinct groups, where
- group 1 has \(m\) elements, and
- group 2 has \(n\) elements, (so \(m + n = N\)).
Think: a box of \(N\) marbles, \(m\) of them are orange and \(n\) of them are green.
Suppose we draw a random sample without replacement of size \(k\) from the population.
Let \(X\) equal the number of elements in the sample of size \(k\) that belong to group 1.
Then \(X\) is called a hypergeometric distribution with parameters \(m, n,\) and \(k\), denoted \(\texttt{hyper}(m,n,k)\). The space of \(X\) is either \(x = 0, 1, \ldots, k\) if \(m \geq k,\) or \(0, 1, \ldots, m\) if \(m < k\).
For each \(x\) in the space of \(X,\) \[p(x) = \frac{\binom{m}{x}\binom{n}{k-x}}{\binom{N}{k}},\] where \(N = m + n\).
The classic “good potatoes/bad potatoes” Example 4.12 has a hypergeometric distribution.
Theorem 7.4 If \(X\) is \(\texttt{hyper}(m,n,k)\) then \[E(X) = k\cdot\frac{m}{N}~~~\text{ and }~~~ V(X) = k \left(\frac{m}{N}\right)\left(\frac{n}{N}\right)\left(\frac{N-k}{N-1}\right),\] where \(N = m + n\).
Example 7.8
Let’s say a bag of 120 skittles has 30 orange ones. If we pick 10 at random, what is the probability that we get more than 5 orange ones?
Let \(X\) denote the number of orange skittles in our sample. Then \(X\) is \(\texttt{hyper}(30,90,10),\) and \[P(X>5) = \sum_{x=6}^{10} \frac{\binom{30}{x}\binom{90}{10-x}}{\binom{120}{10}} \approx .0153.\] We note that in this example \(E(X) = 10 \cdot \frac{30}{120} = 2.5.\)
7.5 Poisson Distribution
The Scene
The Poisson probability distribution can provide a good model for the number of occurrences \(X\) of a rare event in time, space, or some other unit of measure. A Poisson random variable \(X\) has one parameter, \(\lambda,\) which is the average number of occurrences of the rare event in the indicated time (or space, etc.)
Some examples that might be well-modeled by a Poisson distribution:
- the number of customers going through a check-out lane in a grocery store per hour;
- the number of typos per page in a book;
- the number of goals scored in a World Cup soccer game;
- the number of chocolate chips per cookie in a big batch;
- the number of pitches per baseball in a baseball game;
Definition 7.3 (Poisson Distribution) A random variable \(X\) has a Poisson probability distribution with parameter \(\lambda > 0,\) denoted \(\texttt{Poisson}(\lambda)\), if and only if \[p(x) = \frac{\lambda^x e^{-\lambda}}{x!} ~~~\text{ for } x = 0,1,2, \ldots.\]
We take the time to explain how this probability density function does actually model such things, but first let’s check that it is a valid probability density function, and then find \(E(X)\) and \(V(X)\).
First, since \(\lambda > 0,\) each \(p(x)\) is non-negative. Also, recall the Calc II power series formula for \(e^\lambda\) for any real number \(\lambda\) is \[e^\lambda = \sum_{k=0}^\infty \frac{\lambda^k}{k!},\] so we can see that all the probabilities in this distribution sum to 1: \[\begin{align*} \sum_{x=0}^\infty \frac{\lambda^x e^{-\lambda}}{x!} &= e^{-\lambda} \sum_{x=0}^\infty \frac{\lambda^x}{x!} \\ &= e^{-\lambda}\cdot e^{\lambda} \\ &= 1. \end{align*}\]
Theorem 7.5 If \(X\) is \(\texttt{Poisson}(\lambda),\) \(E(X) = \lambda\) and \(V(X) = \lambda\).
Proof. We tackle the mean first.
\[\begin{align*} E(X) &= \sum_{x=0}^\infty x \cdot p(x) \\ &= \sum_{x=1}^\infty x \cdot \frac{\lambda^x e^{-\lambda}}{x!} & \text{since }x=1 \text{ term is } 0\\ &= e^{-\lambda} \sum_{x=1}^\infty \frac{\lambda^x}{(x-1)!} & \text{since } \frac{x}{x!}=\frac{1}{(x-1)!}\\ &= \lambda e^{-\lambda} \sum_{x=1}^\infty \frac{\lambda^{x-1}}{(x-1)!} & \text{pulling out one }\lambda \\ &= \lambda e^{-\lambda} \sum_{k=0}^\infty \frac{\lambda^{k}}{(k)!} & \text{letting } k = x-1\\ &= \lambda e^{-\lambda}\cdot e^{\lambda} &\text{power series for }e^\lambda\\ &= \lambda. \end{align*}\]
Thus, \(\mu = \lambda\).
For \(V(X),\) we first find \(E(X(X-1))\) in much the same way as we found \(E(X)\):
\[\begin{align*} E(X(X-1)) &= \sum_{x=0}^\infty x(x-1)\cdot \frac{\lambda^x e^{-\lambda}}{x!} \\ &= e^{-\lambda} \sum_{x=2}^\infty \frac{x(x-1)}{x!} \cdot \lambda^x & \text{since }x=0,1 \text{ terms are }0. \\ &= e^{-\lambda} \sum_{x=2}^\infty \frac{1}{(x-2)!} \cdot \lambda^x\\ &= \lambda^2 e^{-\lambda} \sum_{x=2}^\infty \frac{1}{(x-2)!} \cdot \lambda^{x-2} & \text{pulling out }\lambda^2 \\ &= \lambda^2 e^{-\lambda} \sum_{k=0}^\infty \frac{1}{(k)!} \cdot \lambda^{k} &\text{letting }k=x-2\\ &= \lambda^2 e^{-\lambda} \cdot e^{\lambda} & \text{power series for }e^\lambda\\ &= \lambda^2. \end{align*}\]
Finally, we find \(V(X)\) using our expectation shortcuts:
\[\begin{align*} V(X) &= E(X^2) - \mu^2 \\ &= [E(X(X-1))+E(X)] - \mu^2 \\ &= [\lambda^2 + \lambda] - \lambda^2 \\ &= \lambda. \end{align*}\]
Example 7.9 Suppose \(X\) gives the number of typos per page in a large printed manuscript, and \(X\) is Poisson with \(\lambda = 2\). Find the probability that a randomly chosen page has (a) fewer than 2 typos, and (b) more than 5 typos.
Part (a) asks for
\[\begin{align*} P(X\leq 1) &= P(X = 0)+P(X=1) \\ &= \frac{2^0e^{-2}}{0!}+\frac{2^1e^{-2}}{1!}\\ &= e^{-2} + 2e^{-2} \\ &=3 e^{-2} \\ &\approx 0.406. \end{align*}\]
Part (b) asks for
\[\begin{align*} P(X > 5) &= 1 - P(X \leq 5) \\ &= 1 - \left[ \frac{2^0e^{-2}}{0!}+\frac{2^1e^{-2}}{1!} + \cdots + \frac{2^5e^{-2}}{5!}\right] \end{align*}\]
Rather than calculate this by hand, we turn to R, and the command ppois(k,lambda)
, which returns \(P(X \leq k)\) when \(X\) is \(\texttt{Poisson}(\lambda)\).
So \(P(X > 5) = 1 - P(X \leq 5)\) = 1 - ppois(5,2)
\(\approx\) 0.017.
Example 7.10 The number of website views at a seldom visited website is Poisson with an average number of 8 visits per day.
What is the probability that the site gets 20 or more visits in a day?
We want \(P(X \geq 20),\) an infinite sum, so we use the strategy of finding the complement, with the help of the function ppois()
(Appendix C) in R for the calculation:
\[\begin{align*} P(X \geq 20) &= 1 - P(X \leq 19) \\ &= 1 - \texttt{ppois}(19,8)\\ &\approx .00025. \end{align*}\]
7.5.1 Poisson Process
If we’re interested in modelling the number of instances of some rare event over a time interval, we can imagine subdividing the interval into \(n\) small pieces, small enough that at most 1 instance can occur in each subinterval. In fact, we can imagine each subinterval constitutes a Bernoulli trial of sorts:
- the probability of 1 instance in a subinterval (“success”) equals \(p\);
- the probability of 0 instances (“failure”) equals \(1-p\).
Then, if \(X\) equals the number of instances in the original interval, then \(X\) is binomial on \(n\) trials with probability of success \(p\) on each trial, assuming identical, independent Bernouilli trials. As we increase \(n,\) breaking the original time period into smaller and smaller subintervals, the corresponding probability \(p\) of seeing 1 instance of the event in a subinterval will decrease, but what if \(n \cdot p\) remains constant? In this case, let \(\lambda = np\) and consider what happens to the probability density function for the \(\texttt{binom}(n,p)\) distribution:
\[\begin{align*} \lim_{n \to \infty} \binom{n}{x}p^x(1-p)^{n-x} & = \\ \lim_{n \to \infty} \frac{n\cdot(n-1)\cdot \cdots \cdot (n - x + 1)}{x!} \left(\frac{\lambda}{n}\right)^x\left(1-\frac{\lambda}{n}\right)^{n-x} & \\ =\lim_{n \to \infty} \frac{\lambda^x}{x!}\cdot\frac{n\cdot(n-1)\cdot \cdots \cdot (n - x + 1)}{n^x}\cdot\left(1-\frac{\lambda}{n}\right)^{n}\left(1-\frac{\lambda}{n}\right)^{-x}\\ =\frac{\lambda^x}{x!}\lim_{n \to \infty}\left(1-\frac{\lambda}{n}\right)^{n}\left(1-\frac{\lambda}{n}\right)^{-x} \cdot \frac{n}{n} \cdot \frac{n-1}{n} \cdot \frac{n-2}{n} \cdot \cdots \cdot \frac{n-x+1}{n}. \end{align*}\]
Now, we have a limit of a product of many terms. If the limit of each term exists, then the overall limit will be the product of each of the individual limits.
First, observe that \((1-\lambda/n)^n \to e^{-\lambda}\) as \(n \to \infty\) by one of the greatest limits in mathematics: \[\lim_{n \to \infty}\left(1+\frac{x}{n}\right)^n = e^x\] for any real number \(x\).
Second, observe that \((1-\lambda/n)^{-x} \to 1^{-x} = 1\) since \(\lambda/n \to 0\) as \(n \to \infty\).
Finally, for any \(k \geq 0,\) \(\lim_{n \to \infty} \frac{n-k}{n} = 1\) so each of the ratios from the third term on converges to 1.
So, if \(np\) remains constant as \(n \to \infty,\) then as \(n \to \infty\) the binomial distribution \(\texttt{binom}(n,p)\) approaches the \(\texttt{Poisson}(\lambda)\) distribution, where \(\lambda = np\). So, the Poisson distribution can be a good model for counting instances of some rare event.
The process described above, where subdivision of the interval leads to Bernoulli trials in such a way that \(np\) remains constant, is called a Poisson process.
Definition 7.4 The process by which an event happens is called a Poisson process if the following holds:
- The dimension over which \(X\) is measured can be subdivided into \(n\) small pieces, within which the event can occur at most once.
- In each small piece, the probability of seeing one occurrence is the same, say \(p,\) and \(p\) is proportional to the length of the sub-interval (as \(n\) grows, \(p\) shrinks, but \(np\) remains constant).
- Occurrences in all the small pieces are independent from one another.
In the limit derivation above we demonstrated the following:
For large \(n,\) if we let \(\lambda = np,\) \(\texttt{Poisson}(\lambda) \sim \texttt{binom}(n,p)\).
For instance, in the table below we compare the probabilities for \(\texttt{binom}(10,.4),\) \(\texttt{binom}(40,.1),\) and \(\texttt{binom}(400,.01)\) with those of a \(\texttt{Poisson}(4)\) distribution:
x | binom(10,.4) | binom(40,.1) | binom(400,.01) | Poisson(4) |
---|---|---|---|---|
0 | 0.0060 | 0.0148 | 0.0180 | 0.0183 |
1 | 0.0403 | 0.0657 | 0.0725 | 0.0733 |
2 | 0.1209 | 0.1423 | 0.1462 | 0.1465 |
3 | 0.2150 | 0.2003 | 0.1959 | 0.1954 |
4 | 0.2508 | 0.2059 | 0.1964 | 0.1954 |
5 | 0.2007 | 0.1647 | 0.1571 | 0.1563 |
6 | 0.1115 | 0.1068 | 0.1045 | 0.1042 |
7 | 0.0425 | 0.0576 | 0.0594 | 0.0595 |
Example 7.11 Industrial accidents occur according to a Poisson process with an average of 3 accidents per month. During the last 2 months 10 accidents occurred. Does this number seem highly improbable if the mean number of accidents per month is still equal to 3? Does it indicate a genuine increase in the mean number of accidents per month?
If \(X\) equals the number of accidents in two months, then \(X\) is Poisson with mean \(\lambda = 6\).
Then \(P(X \geq 10) = 1 - P(X \leq 9) =\) 1-ppois(9,6)
= 0.084.
Let’s consider this result. If the mean number of accidents per month is still 3, we would expect to observe 10 or more accidents over a two month period about 8 times out of 100, which is unlikely, but not extremely unlikely. Better safe than sorry, I would send out a safety memo and closely monitor what unfolds over the next month!
Now, if we had 5 more accidents the next month, giving us 15 over a 3 month window, chances of 15 or more over 3 months is \(P(X \geq 15),\) where \(X\) is \(\texttt{Poisson}(9)\). Using R, this probability is 1-ppois(14,9)
= 0.041.
So we have about a 4% chance of seeing 15 or more over a 3 month window if the mean number of accidents per month is 3. I would investigate whether some practice has changed to make accidents more likely than they used to be.
Example 7.12 For a certain type of soil the number of wireworms per cubic foot has a Poisson distribution with mean of 100.
Give an interval that captures at least 5/9ths of the distribution.
OK, this feels like a job for Tchebbysheff.
Here \(X\) is Poisson with parameter \(\lambda = 100,\) so \(\mu = 100\) and \(\sigma = \sqrt{100} = 10.\)
Thinking of Tchebbysheff’s inequality, let’s find \(k\) so that \(1 - 1/k^2 = 5/9\). Then \[P(|X-\mu|<k\sigma)\geq 5/9,\] meaning at least 5/9ths of the distribution is in the interval \((\mu - k\sigma,\mu+k\sigma)\).
Solving \(1 - \frac{1}{k^2} = \frac{5}{9}\) for \(k>0\) yields \(k = \frac{3}{2},\) so the interval is 85 to 115 wireworms.