3 Discrete Probability Distributions

In this chapter we develop basic notions of a probability model for a chance experiment.

3.1 Sample Space

A chance experiment is some repeatable process whose outcome on any given trial cannot be known ahead of time. Here are a few examples of chance experiments:

  1. Flip a coin.
  2. Roll a 6-sided die.
  3. Flip a coin three times.
  4. Shoot free throws until we’ve made three.
  5. Count “scintilations” in 72 second intervals caused by radioactive decay of a quantity of polonium (Rutherford and Geiger).

Definition 3.1 The sample space of a chance experiment is the set of possible basic outcomes of the experiment. The elements of a sample space are called sample points or simple events, and any subset of a sample space is called an event.

We often have some chioce in how to record the possible outcomes of a chance experiment. For instance, we might record the sample spaces for the experiments above as follows:

  1. \(S = \{ H, T \}\) (\(H\) for heads, \(T\) for tails).
  2. \(S = \{1,2,3,4,5,6\}\) (recording the value that is face up after rolling the die.)
  3. \(S = \{HHH,HHT,HTH,THH,HTT,THT,TTH,TTT\}\) (record the result of each flip in order). Alternatively, we might just record how many heads we flipped, in which case \(S = \{0,1,2,3\},\) but we lose some information about the experiment in doing so.
  4. \(S = \{111, 1101, 1011, 0111, 11001, 10101, 01101, 10011, 01011, 00111, \ldots \},\) where 0 represents missing a shot, and 1 represents making a shot.
  5. \(S = \{0, 1, 2, 3, 4, \ldots \}\).

In the first three examples, \(S\) is a finite set, while \(S\) appears to be an infinite set in the last two examples. There is, of course, a limit to how many free throws I can attempt in my life (if I shoot one free throw every 15 seconds for 100 years, that’s only about 210 million attempts :)), but, in the context of building a probability model to describe the chance experiment of shooting free throws until I’ve made three, I have no reason to limit how many attempts I need to get that done.

Although infinite, the sample spaces in the last two examples are countably infinite. Recall, a set is countably infinite if its elements can be counted, i.e., can be put in one-to-one correspondence with the positive integers.

Definition 3.2 The sample space of a chance experiment is called discrete if the sample space is finite or countably infinite.

If you asked me to pick a random real number from the unit interval \(I = [0,1],\) this is a chance experiment with an uncountable sample space, and something we are not considering in this chapter. We focus on such games in Chapter 9.

Definition 3.3 Given a chance experiment with discrete sample space \(S,\) a probability distribution function on the elements of \(S\) is a real-valued function \(m\) which satisfies these two conditions:

  1. \(m(s) \geq 0\) for all \(s \in S,\) and
  2. \(\displaystyle \sum_{s \in S} m(s) = 1.\)

We define the probability of any event \(E\) of \(S\) to be \[P(E) = \sum_{s \in E} m(s).\]

Let’s consider our first three chance experiments once more.

  1. If we flip a fair coin once, then \(S = \{H,T\},\) and it is reasonable to assign the probabilities \[m(H) = \frac{1}{2}, ~ m(T) = \frac{1}{2}.\]

  2. If a 6-sided die is balanced, it is reasonable to assign the probabilities \[m(i) = \frac{1}{6}\] for each \(i = 1, 2, 3, 4, 5, 6\). If we consider the event \(E\) to be that we roll a prime number, then \[P(E) = \sum_{s \in E} m(s) = p(2) + p(3) + p(5) = 1/2.\]

  3. If we flip a fair coin 3 times, it seems reasonable that each of the 8 possible sequences of three flips in \(S\) is equally likely, so we can assign the probability distribution function \(m(s) = 1/8\) for each element \(s \in S\).

In the case of a countably infinite sample space (such as shooting free throws until we’ve made three), defining a valid probability function requires more care: to check that the sum of all \(m(s)\) equals 1 requires the evaluation of an infinite series.

3.2 Discrete Random Variables

Definition 3.4 A discrete random variable is a real-valued function defined over a discrete sample space. We usually let \(X\) or \(Y\) denote a random variable. Given random variable \(X,\) the space of \(X\) is the set of possible outcomes for \(X\).

Example 3.1 (Flip a coin 3 times) Consider the experiment of flipping a coin three times. We record as much information as possible about this experiment by providing the sequence of the results of the three flips. Thus, the sample space for this experiment is: \[S = \{HHH,HHT,HTH,THH,HTT,THT,TTH,TTT\}.\]

We might be interested in knowing how many times we flipped heads, or perhaps we want to know whether we ever flipped heads twice in a row. We can use random variables to keep track of these sorts of things.

Let \[X = \text{the number of heads in three flips}.\] Note that the space of \(X\) is the set \(\{0, 1, 2, 3\}\) (we can get anywhere between 0 and 3 heads in 3 flips).

Or, if we’re interested in whether we ever flipped consecutive heads in our 3 flips, we could let \[Y = \begin{cases} 1 & \text{if we ever flipped consecutive heads} \\ 0 & \text{else.} \end{cases}\] The space of \(Y\) is \(\{0,1\}\).

Again, formally, the random variables \(X\) and \(Y\) are functions whose inputs are elements in \(S,\) and whose outputs are real numbers. We can display these functions in table form when the sample space is small, as in Table 3.1.

Table 3.1: Random variables X and Y associated to the event of flipping a coin 3 times.
S HHH HHT HTH THH HTT THT TTH TTT
X 3 2 2 2 1 1 1 0
Y 1 1 0 1 0 0 0 0

If \(X\) is a random variable associated to an experiment, and we have a probability distribution function assigned to the sample space \(S,\) we can naturally ask about the probability that \(X\) takes on a particular value \(x\).

Definition 3.5 The probability that a random variable \(X\) takes on value \(x,\) denoted \(P(X = x)\) or \(p(x),\) is defined as the sum of the probabilities of all sample points in \(S\) that are assigned the value \(x\). The function \(p(x)\) is called the distribution function of the discrete random variable \(X,\) and the probability distribution of \(X\) refers to the the list of possible values for \(x\) along with their associated probabilities \(p(x)\) (usually given as a table or function).

Example 3.2 (Flip a coin 3 times (Part II)) Consider again the “flip a coin three times” Example 3.1 and the associated random variables \(X\) and \(Y,\) which counted the number of heads flipped, and whether we flipped consecutive heads, respectively. Table 3.1 provides the values for these random variables.

We assume \(m(s) = 1/8\) for each \(s \in S\) (all 8 sequences are equally likely), so we have the following probability distributions:

\[ \begin{array}{c|c|c|c|c} x & 0 & 1 & 2 & 3 \\ \hline p(x) & 1/8 & 3/8 & 3/8 & 1/8 \end{array} \] and

\[ \begin{array}{c|c|c} y & 0 & 1 \\ \hline p(y) & 5/8 & 3/8 \end{array} \]

Example 3.3 (Rolling Two Dice) The chance experiment of rolling two regular 6-sided dice is a staple of the board game industry. A convenient way to describe the sample space in this setting is to treat the dice as distinct (say, one red die and one blue die), and write down all possible pairs of values \((r,b)\) where \(r\) is the red die value, \(b\) is the blue die value. The sample space for rolling two 6-sided dice thus has 36 elements, which we can describe via a \(6 \times 6\) grid.

Table 3.2: The sample space for rolling two dice
1 2 3 4 5 6
1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

We may be interested in \(X,\) the sum of the two dice. The \(6 \times 6\) grid is handy for representing this random variable:

Table 3.3: X, the sum of two dice
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Assuming the probability of each element in \(S\) is 1/36, the probability distribution for \(X\) is

\[ \begin{array}{c|c|c|c|c|c|c|c|c|c|c|c} x & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \\ \hline p(x) & 1/36 & 2/36 & 3/36 & 4/36 & 5/36 & 6/36 & 5/36 & 4/36 & 3/36 & 2/36 & 1/36 \end{array} \] More succinctly, we have \[p(x) = \frac{6-|x-7|}{36} ~~~\text {for } x= 2, 3, \ldots, 12.\]

Maybe we’re interested in how far apart the two values are, so we consider the random variable \(Y\) equal to the absolute value of the difference of the two dice:

Table 3.4: Y, the absolute value of the difference of two dice
1 2 3 4 5 6
1 0 1 2 3 4 5
2 1 0 1 2 3 4
3 2 1 0 1 2 3
4 3 2 1 0 1 2
5 4 3 2 1 0 1
6 5 4 3 2 1 0

So, the probability distribution for \(Y\) is

\[ \begin{array}{c|c|c|c|c|c|c} y & 0 & 1 & 2 & 3 & 4 & 5 \\ \hline p(x) & 6/36 & 10/36 & 8/36 & 6/36 & 4/36 & 2/36 \end{array} \]

3.3 Calculating Probabilities

Recall the scene:

  1. We conduct a chance experiment, to which we associate the sample space \(S\) of possible outcomes.
  2. To each sample point \(s\) in \(S\) we assign a reasonable probability, \(m(s),\) that \(s\) occurs (being sure that all \(m(s)\) are non-negative and that they sum to 1).
  3. For any event \(A\) associated to this experiment (i.e., \(A\) is a subset of \(S\)), we define \(\displaystyle P(A) = \sum_{s \in A} m(s).\)
  4. For a random variable \(X\) associated to \(S,\) \(P(X = x)\) equals the sum of the \(m(s)\) for which \(s\) is assigned value \(x\).

3.3.1 Sample Point Method

So far we have been finding probability distributions by following what is called the sample-point method (list all the sample points, assign probabilities to each, and go!).

Here’s one more example of finding probabilities via the sample-point method.

Example 3.4 (Random Phones)

Four phones are found in a classroom after class. The professor returns them at random to the four students the next class. Let \(X\) denote the number of students who receive the correct phone. Let’s determine the probability distribution for \(X\) by the sample-point method.

The chance experiment here is straight-forward: randomly return 4 phones to the 4 students who own them. We list the basic outcomes as follows:

  • Name the students “a”, “b”, “c”, and “d”, and name their phones by the same letter (student “a” owns phone “a”, etc).
  • Return the phones randomly to the students so that “a” receives the first phone, “b” the second, and so on.
  • record the results of the experiment by writing down the phone names in the order in which they were returned.
  • For instance, recording “c b a d” would mean student \(a\) received phone \(c,\) student \(b\) received phone \(b\) (their own phone!), student \(c\) received phone \(a,\) and student \(d\) received their own phone, \(d\).

In this way, the 24 different permutations of the letters “a b c d” listed in Table 3.5 correspond to the 24 basic outcomes possible in this experiment. For each basic outcome in the table we also record \(X,\) the number of students to receive their own phone for that basic outcome.

Table 3.5: Returning 4 phones at random, X counts how many students receive their own phone.
a b c d X a b c d X
a b c d 4 b a c d 2
a b d c 2 b a d c 0
a c b d 2 b c a d 1
a c d b 1 b c d a 0
a d b c 1 b d a c 0
a d c b 2 b d c a 1
c a b d 1 c b a d 2
c a d b 0 c b d a 1
c d a b 0 c d b a 0
d a b c 0 d b a c 1
d a c b 1 d b c a 2
d c a b 0 d c b a 0

If the professor truly returns the phones at random, each of the 24 possible outcomes is equally likely. In other words, for each element \(s\) in the sample space \(S,\) \(m(s) = 1/24\). It follows that the probability distribution for \(X\) is

\[ \begin{array}{c|c|c|c|c|c} x & 0 & 1 & 2 & 3 & 4\\ \hline p(x) & 9/24 & 8/24 & 6/24 & ~~0~~ & 1/24 \end{array} \]

It looks like the most likely scenario upon returning the phones at random is that no one gets their phone back, and there is about a 4 percent chance that everyone gets their phone back.

This sample-point method for determining probabilities will not be much help if we have a huge sample space, and huge sample spaces arise easily, such as in a friendly game of cards. We examine 5-card poker hands later, beginning with Example 4.7, but mention here that a player can be dealt about 2.6 million possible 5-card hands from a regular 52 card deck. So, in an effort to determine the probability of obtaining a particular type of hand, say a 3 of a kind, I will not be using the sample-point method!

We have two alternatives to the sample-point method:

  • simulation (draw 5 cards at random many, many times, and see how often you get a 3 of a kind).
  • learn counting techniques in Chapter 4!!