Slides: Distribution of Discrete Data
DISTRIBUTIONS OF DISCRETE DATA
ST101 – DR. ARIC LABARR
WHAT ARE DISTRIBUTIONS?
DISTRIBUTIONS OF DISCRETE DATA
A random variable is a numerical description of the outcome of an experiment.
They can be either discrete or continuous.
A discrete random variable may assume either a finite number of values or an infinite sequence of values.
RANDOM VARIABLES
A random variable is a numerical description of the outcome of an experiment.
They can be either discrete or continuous.
A discrete random variable may assume either a finite number of values or an infinite sequence of values.
Finite example: Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
RANDOM VARIABLES
A random variable is a numerical description of the outcome of an experiment.
They can be either discrete or continuous.
A discrete random variable may assume either a finite number of values or an infinite sequence of values.
Infinite example: Let x be the number of customers arriving in one day at a small department store where x can take the values of 0, 1, 2, …
RANDOM VARIABLES
A random variable is a numerical description of the outcome of an experiment.
They can be either discrete or continuous.
A discrete random variable may assume either a finite number of values or an infinite sequence of values.
A continuous random variable may assume any numerical value in an interval or collection of intervals.
RANDOM VARIABLES
DISCRETE VS. CONTINUOUS
Discrete Example:
Let x be the number of individuals living in a home.
Continuous Example:
Let x be the distance in miles from home to the store.
A random variable is a numerical description of the outcome of an experiment.
A discrete random variable may assume either a finite number of values or an infinite sequence of values.
A continuous random variable may assume any numerical value in an interval or collection of intervals.
SUMMARY
DISCRETE PROBABILITY DISTRIBUTIONS
DISTRIBUTIONS OF DISCRETE DATA
The probability distribution for a random variable describes how probabilities are distributed over the values of the random variable.
Essentially, what is the frequency of occurrence of different values of the variable.
PROBABILITY DISTRIBUTION
Frequency – number of observations in each category in the data set
Relative Frequency – proportion of total observations contained in a given category
Cumulative Frequency – summary of data set i number of observations with values less than or equal to upper limit of the category
Cumulative Relative Frequency – proportion of observations with value less than or equal to upper limit of the category
NOTATION
The probability distribution for a random variable describes how probabilities are distributed over the values of the random variable.
Relative frequencies can be used as estimates to the probability of an event occurring.
Probability distributions for discrete random variables are best described with tables, graphs, or equations.
PROBABILITY DISTRIBUTION
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
Relative Frequency
0
90
1
85
2
70
3
45
4
50
5
25
365
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
Relative Frequency
0
90
90
90/365
1
85
2
70
3
45
4
50
5
25
365
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
Relative Frequency
0
90
90
0.25
1
85
90+85
0.23
2
70
3
45
4
50
5
25
365
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
Relative Frequency
0
90
90
0.25
1
85
175
0.23
2
70
245
0.19
3
45
290
0.12
4
50
340
0.14
5
25
365
0.07
365
1.00
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
Relative Frequency
0
90
90
0.25
1
85
175
0.23
2
70
245
0.19
3
45
290
0.12
4
50
340
0.14
5
25
365
0.07
365
1.00
Probability!
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
0
90
90
0.25
1
85
175
0.23
2
70
245
0.19
3
45
290
0.12
4
50
340
0.14
5
25
365
0.07
365
1.00
The probability distribution for a random variable describes how probabilities are distributed over the values of the random variable.
Frequency is the number of observations in each category in the data set.
Relative frequency is the proportion of total observations contained in a given category.
Cumulative frequency is the summary of data set i number of observations with values less than or equal to upper limit of the category.
Cumulative relative frequency is the proportion of observations with value less than or equal to upper limit of the category.
SUMMARY
EXPECTED VALUE AND VARIANCE
DISTRIBUTIONS OF DISCRETE DATA
The expected value, or mean, of a random variable is a measure of its central location.
It is defined by:
Think about the expected value as a weighted mean, where the probability function serves as the weight.
EXPECTED VALUE
EXPECTED VALUE
EXPECTED VALUE
EXPECTED VALUE
The expected value, or mean, of a random variable is a measure of its central location.
It is defined by:
Think about the expected value as a weighted mean, where the probability function serves as the weight.
EXPECTED VALUE
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
0
90
90
0.25
1
85
175
0.23
2
70
245
0.19
3
45
290
0.12
4
50
340
0.14
5
25
365
0.07
365
1.00
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
0
90
90
0.25
0.00
1
85
175
0.23
2
70
245
0.19
3
45
290
0.12
4
50
340
0.14
5
25
365
0.07
365
1.00
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
0
90
90
0.25
0.00
1
85
175
0.23
0.23
2
70
245
0.19
3
45
290
0.12
4
50
340
0.14
5
25
365
0.07
365
1.00
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
Let’s examine the past year of data.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
0
90
90
0.25
0.00
1
85
175
0.23
0.23
2
70
245
0.19
0.38
3
45
290
0.12
0.36
4
50
340
0.14
0.56
5
25
365
0.07
0.35
365
1.00
1.88
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
We expect to sell 1.88 TV’s per day on average.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
Cumulative Frequency
0
90
90
0.25
0.00
1
85
175
0.23
0.23
2
70
245
0.19
0.38
3
45
290
0.12
0.36
4
50
340
0.14
0.56
5
25
365
0.07
0.35
365
1.00
1.88
The variance of a random variable is a measure of its variability/spread.
It is defined by:
The standard deviation is the square root of the variance.
VARIANCE
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
We expect to sell 1.88 TV’s per day on average.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
0
90
0.25
0.00
-1.88
1
85
0.23
0.23
2
70
0.19
0.38
3
45
0.12
0.36
4
50
0.14
0.56
5
25
0.07
0.35
365
1.00
1.88
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
We expect to sell 1.88 TV’s per day on average.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
0
90
0.25
0.00
-1.88
1
85
0.23
0.23
-0.88
2
70
0.19
0.38
0.12
3
45
0.12
0.36
1.12
4
50
0.14
0.56
2.12
5
25
0.07
0.35
3.12
365
1.00
1.88
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
We expect to sell 1.88 TV’s per day on average.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
0
90
0.25
0.00
-1.88
3.53
1
85
0.23
0.23
-0.88
2
70
0.19
0.38
0.12
3
45
0.12
0.36
1.12
4
50
0.14
0.56
2.12
5
25
0.07
0.35
3.12
365
1.00
1.88
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
We expect to sell 1.88 TV’s per day on average.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
0
90
0.25
0.00
-1.88
3.53
1
85
0.23
0.23
-0.88
0.77
2
70
0.19
0.38
0.12
0.01
3
45
0.12
0.36
1.12
1.25
4
50
0.14
0.56
2.12
4.49
5
25
0.07
0.35
3.12
9.73
365
1.00
1.88
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
We expect to sell 1.88 TV’s per day on average.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
0
90
0.25
0.00
-1.88
3.53
0.883
1
85
0.23
0.23
-0.88
0.77
2
70
0.19
0.38
0.12
0.01
3
45
0.12
0.36
1.12
1.25
4
50
0.14
0.56
2.12
4.49
5
25
0.07
0.35
3.12
9.73
365
1.00
1.88
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
The variance of daily sales is 2.522 TV’s squared.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
0
90
0.25
0.00
-1.88
3.53
0.883
1
85
0.23
0.23
-0.88
0.77
0.177
2
70
0.19
0.38
0.12
0.01
0.002
3
45
0.12
0.36
1.12
1.25
0.150
4
50
0.14
0.56
2.12
4.49
0.629
5
25
0.07
0.35
3.12
9.73
0.681
365
1.00
1.88
2.522
Let x be the number of TV’s sold at a small department store in one day where x can only take the values of {0, 1, 2, 3, 4, 5}
The standard deviation of daily sales is 1.588 TV’s.
DISCRETE PROBABILITY EXAMPLE
TV’s Sold
Number of Days (Freq)
0
90
0.25
0.00
-1.88
3.53
0.883
1
85
0.23
0.23
-0.88
0.77
0.177
2
70
0.19
0.38
0.12
0.01
0.002
3
45
0.12
0.36
1.12
1.25
0.150
4
50
0.14
0.56
2.12
4.49
0.629
5
25
0.07
0.35
3.12
9.73
0.681
365
1.00
1.88
2.522
The expected value, or mean, of a random variable is a measure of its central location:
The variance of a random variable is a measure of its variability/spread:
SUMMARY
BINOMIAL DISTRIBUTION
DISTRIBUTIONS OF DISCRETE DATA
EXAMPLE REVIEW
Example: you have a 2-step random process where you flip a coin twice (independent flips):
Flip Coin
Heads – Flip
Tails
Tails – Flip
Tails
Heads
Heads
There are 4 properties of a binomial experiment:
The experiment consists of a sequence of n identical trials.
Only two outcomes, success or failure, are possible on each trial.
The probability of a success, denoted as p, does not change from trial to trial.
The trials are independent.
BINOMIAL EXPERIMENT
There are 4 properties of a binomial experiment:
The experiment consists of a sequence of n identical trials. (2 coin flips)
Only two outcomes, success or failure, are possible on each trial. (H or T)
The probability of a success, denoted as p, does not change from trial to trial. (0.5)
The trials are independent. (Independent coin flips)
BINOMIAL EXPERIMENT
The binomial distribution looks at the probabilities of the number of successes occurring in the n trials.
We use x to denote the number of successes occurring in the n trials.
BINOMIAL DISTRIBUTION
The binomial distribution looks at the probabilities of the number of successes occurring in the n trials.
We use x to denote the number of successes occurring in the n trials.
For example:
Let a success be rolling a dice and getting a 2.
We roll a dice 10 times, n, and successfully role a 2, three times, x.
We are interested in the probability of exactly 3 rolls equal to 2.
BINOMIAL DISTRIBUTION
The binomial probability function is defined as:
PROBABILITY FUNCTION
Number of outcomes providing exactly
x successes in n trials
The binomial probability function is defined as:
PROBABILITY FUNCTION
Number of outcomes providing exactly
x successes in n trials
Probability of a particular
sequence of trial outcomes
with x successes in n trials
For example:
Let a success be rolling a dice and getting a 2.
We roll a dice 10 times, n, and successfully role a 2, three times, x.
We are interested in the probability of exactly 3 rolls equal to 2.
BINOMIAL DISTRIBUTION EXAMPLE
For example:
Let a success be rolling a dice and getting a 2.
We roll a dice 10 times, n, and successfully role a 2, three times, x.
We are interested in the probability of exactly 3 rolls equal to 2.
BINOMIAL DISTRIBUTION EXAMPLE
For example:
Let a success be rolling a dice and getting a 2.
We roll a dice 10 times, n, and successfully role a 2, three times, x.
We are interested in the probability of exactly 3 rolls equal to 2.
BINOMIAL DISTRIBUTION EXAMPLE
For example:
Let a success be rolling a dice and getting a 2.
We roll a dice 10 times, n, and successfully role a 2, three times, x.
We are interested in the probability of exactly 3 rolls equal to 2.
BINOMIAL DISTRIBUTION EXAMPLE
You have a retention rate of 90% for your employees annually.
In other words, any random employee has a probability of 0.1 to leave this year.
Choosing 3 employees at random, what is the probability that exactly 1 of them will leave the company this year?
BINOMIAL DISTRIBUTION EXAMPLE
You have a retention rate of 90% for your employees annually.
In other words, any random employee has a probability of 0.1 to leave this year.
Choosing 3 employees at random, what is the probability that exactly 1 of them will leave the company this year?
BINOMIAL DISTRIBUTION EXAMPLE
BINOMIAL DISTRIBUTION EXAMPLE
1st Worker
2nd Worker
3rd Worker
Leaves
(0.1)
Stays
(0.9)
Leaves
(0.1)
Leaves
(0.1)
Stays
(0.9)
Stays
(0.9)
L(0.1)
L(0.1)
L(0.1)
L(0.1)
S(0.9)
S(0.9)
S(0.9)
S(0.9)
x
Prob
3
2
2
1
2
1
1
0
0.001
0.009
0.009
0.081
0.009
0.081
0.081
0.729
BINOMIAL DISTRIBUTION EXAMPLE
1st Worker
2nd Worker
3rd Worker
Leaves
(0.1)
Stays
(0.9)
Leaves
(0.1)
Leaves
(0.1)
Stays
(0.9)
Stays
(0.9)
L(0.1)
L(0.1)
L(0.1)
L(0.1)
S(0.9)
S(0.9)
S(0.9)
S(0.9)
x
Prob
3
2
2
1
2
1
1
0
0.001
0.009
0.009
0.081
0.009
0.081
0.081
0.729
For the binomial distribution, the following is always true:
Expected value:
Variance:
Standard deviation:
EXPECTED VALUE AND VARIANCE/STANDARD DEVIATION
For example:
Let a success be rolling a dice and getting a 2.
We roll a dice 10 times, n, and successfully role a 2, three times, x.
We are interested in the probability of exactly 3 rolls equal to 2.
BINOMIAL DISTRIBUTION EXAMPLE
We expect to roll a 2 on the dice 1.67 times out of 10 chances.
For example:
You have a retention rate of 90% for your employees annually.
In other words, any random employee has a probability of 0.1 to leave this year.
Choosing 3 employees at random, what is the probability that exactly 1 of them will leave the company this year?
BINOMIAL DISTRIBUTION EXAMPLE
We expect to roll a 0.3 of the 3 employees to leave this year.
The binomial distribution looks at the probabilities of the number of successes occurring in the n independent trials.
The binomial probability function is comprised of two intuitive pieces:
SUMMARY
Number of outcomes providing exactly
x successes in n trials
Probability of a particular
sequence of trial outcomes
with x successes in n trials