Slides: Randomness in Data
RANDOMNESS IN DATA
ST101 – DR. ARIC LABARR
PROBABILITY (AND RISK)
RANDOMNESS IN DATA
CHANCE
Random – an outcome is random if we know the particular outcomes that something could have but are unsure of which of those outcomes is about to happen.
People throughout history have tried to measure patterns in randomness and answer the question, “what would happen if we did this many times?”
CHANCE
Try flipping a coin.
Each flip is completely random 🡪 you are unsure of the specific outcome.
If your coin is fair (evenly weighted), then in many flips you should get approximately 50% heads and 50% tails.
PROBABILITY
The probability that an event happens is a numerical measure of the likelihood of that event’s occurrence.
0
0.5
1
Likelihood of Occurrence
Probability
The event is
very unlikely
to happen.
The event is
equally likely
to happen as
unlikely to.
The event is
very likely
to happen.
5
PROBABILITY
The probability that an event happens is a numerical measure of the likelihood of that event’s occurrence.
Probabilities are numbers between 0 and 1.
Percentages are numbers between 0 and 100.
Sample space: the collection of all possible outcomes in a random process.
Sum of all probabilities for an experiment must sum to 1.
6
EVENTS
An event is a collection of one or more outcomes from a process whose result cannot be predicted with certainty.
The probability of an event A is denoted, P(A).
Examples:
When flipping a fair dice, what is the probability of it landing on heads?
When rolling a fair dice, what is the probability of rolling a 6?
MULTI-STEP RANDOM PROCESSES
TREE DIAGRAMS
Multi-step random processes can be visualized easily with tree diagrams.
For example, you have a 2-step random process where you flip a coin twice:
Flip Coin
Heads – Flip
Tails
Tails – Flip
Tails
Heads
Heads
Probabilities of an event occurring must be between 0 and 1.
The sum of the probabilities of all events in an experiment must equal 1.
There are three typical methods for assigning probabilities to events:
Classical Method
Relative Frequency Method
Subjective Method
ASSIGNING PROBABILITIES
CLASSICAL METHOD
RELATIVE FREQUENCY METHOD
The relative frequency method of assigning probabilities assigns probabilities based on experimentation or historical data.
For example, you don’t believe that I have a fairly weighted dice so you ask me to roll it 100 times and get the following:
Value of the Roll
Frequency
Experimental Probability
1
10
0.10
2
25
0.25
3
42
0.42
4
7
0.07
5
10
0.10
6
6
0.06
Circumstances might change rapidly in the events you are trying to build probabilities for, so things shouldn’t be based solely on historical data.
Use both a combination of historical data values as well experience and intuition about how likely an event will be to occur.
Best probability estimates are typically a combination of subjective and classical/relative frequency methods.
SUBJECTIVE METHOD
An outcome is random if we know the particular outcomes that something could have but are unsure of which of those outcomes is about to happen.
The probability that an event happens is a numerical measure of the likelihood of that event’s occurrence.
Classical Method
Relative Frequency Method
Subjective Method
An event is a collection of one or more outcomes from a process whose result cannot be predicted with certainty.
SUMMARY
LAW OF LARGE NUMBERS
RANDOMNESS IN DATA
CHANCE
Try flipping a coin.
Each flip is completely random 🡪 you are unsure of the specific outcome.
If your coin is fair (evenly weighted), then in many flips you should get approximately 50% heads and 50% tails.
Chance behavior is unpredictable in the short run, but predictable in the long run.
EXAMPLE – TOSS A COIN
Toss a coin 500 times and record the proportion of heads as you go.
Early on, the proportion of heads can vary drastically.
In the long run, it goes to what we expect.
If we flip coin enough times, the overall proportion of times it lands on heads (or tails) gets closer to 50%.
Reasonable to assume that it is a fair coin – half of the time it lands on heads.
The law of large numbers states that as the number of independent trials increases, in the long run the proportion for a certain event gets closer and closer to a single value (the probability of the event).
LAW OF LARGE NUMBERS
LAW OF LARGE NUMBERS
The law of large numbers states that as the number of independent trials increases, in the long run the proportion for a certain event gets closer and closer to a single value (the probability of the event).
MYTH OF SHORT RUN PREDICTABILITY
Chance behavior is unpredictable in the short run, but predictable in the long run.
This is counter-intuitive to most people.
Example – Which of the following outcomes of flipping a fair coin 4 times is more probable?
H, T, T, H
H, H, H, H
MYTH OF SHORT RUN PREDICTABILITY
Chance behavior is unpredictable in the short run, but predictable in the long run.
This is counter-intuitive to most people.
Example – Which of the following outcomes of flipping a fair coin 4 times is more probable?
H, T, T, H
H, H, H, H
SAME!
MYTH OF SHORT RUN PREDICTABILITY
Heads – Flip
Heads – Flip
Tails – Flip
Tails – Flip
Tails – Flip
Heads – Flip
Heads – Flip
Heads
Tails
Heads
Tails
Heads
Tails
Heads
Tails
Flip Coin
Tails – Flip
MYTH OF SHORT RUN PREDICTABILITY
Heads – Flip
Heads – Flip
Tails – Flip
Tails – Flip
Tails – Flip
Heads – Flip
Heads – Flip
Heads
Tails
Heads
Tails
Heads
Tails
Heads
Tails
Flip Coin
Tails – Flip
MYTH OF SHORT RUN PREDICTABILITY
Heads – Flip
Heads – Flip
Tails – Flip
Tails – Flip
Tails – Flip
Heads – Flip
Heads – Flip
Heads
Tails
Heads
Tails
Heads
Tails
Heads
Tails
Flip Coin
Tails – Flip
MYTH OF THE “HOT HAND”
Chance behavior is unpredictable in the short run, but predictable in the long run.
This is counter-intuitive to most people.
Example – A player in basketball makes 4 straight shots as compared to makes 1, misses 2, then makes 1. Same idea as the coins!
H, T, T, H
H, H, H, H
MYTH OF THE “LAW OF AVERAGES”
Chance behavior is unpredictable in the short run, but predictable in the long run.
This is counter-intuitive to most people.
Example – A couple has four kids that are all boys. They think their 5th kid must “be due” to have a girl. Same idea as the coins! Still 50/50.
MYTH OF THE “LAW OF AVERAGES”
Chance behavior is unpredictable in the short run, but predictable in the long run.
This is counter-intuitive to most people.
The “law of averages” doesn’t exist 🡪 things even out in the long run, but chances of outcomes do not change.
Chance behavior is unpredictable in the short run, but predictable in the long run.
The law of large numbers states that as the number of independent trials increases, in the long run the proportion for a certain event gets closer and closer to a single value (the probability of the event).
The “law of averages” doesn’t exist 🡪 things even out in the long run, but chances of outcomes do not change.
SUMMARY
BASIC PROBABILITY RULES
RANDOMNESS IN DATA
BASIC RELATIONSHIPS
You do not always know all of the sample point probabilities in an event.
However, there are some basic probability relationships that still can be used to calculate the probability of an event occurring:
Complement of an Event
Union of Two Events
Intersection of Two Events
Mutually Exclusive Events
30
COMPLEMENT OF AN EVENT
Sample Space
Event A
UNION OF TWO EVENTS
Event A
Event B
Event A
Sample Space
UNION OF TWO EVENTS
Event A
Event B
Event A
Sample Space
INTERSECTION OF TWO EVENTS
Event B
Event A
Intersection
Sample Space
The addition law provides a way to compute the union of events A and B:
ADDITION LAW
MUTUALLY EXCLUSIVE EVENTS
Two events are mutually exclusive if the events have no sample points in common – do not intersect.
This also means that the events cannot both occur. If one event occurs, the other cannot.
Event B
Event A
Sample Space
The addition law provides a way to compute the union of events A and B:
ADDITION LAW – MUTUALLY EXCLUSIVE EVENTS
If two events are mutually exclusive
they do not intersect.
EXAMPLE
Weather
Spring
Summer
Fall
Winter
Total
Clear of Cloudy
626,986
799,443
519,487
312,036
2,257,952
Misty
288,096
250,679
302,510
155,573
996,858
Rain or Snow
3,507
11,007
19,616
3,739
37,869
Total
918,589
1,061,129
841,613
471,348
3,292,679
38
EXAMPLE
What is the probability a random customer uses the bike service in Fall and it was raining or snowing?
39
EXAMPLE
Weather
Spring
Summer
Fall
Winter
Total
Clear of Cloudy
626,986
799,443
519,487
312,036
2,257,952
Misty
288,096
250,679
302,510
155,573
996,858
Rain or Snow
3,507
11,007
19,616
3,739
37,869
Total
918,589
1,061,129
841,613
471,348
3,292,679
40
EXAMPLE
What is the probability a random customer uses the bike service in Fall and it was raining or snowing?
41
EXAMPLE
What is the probability a random customer uses the bike service in Fall or it was raining or snowing?
42
EXAMPLE
Weather
Spring
Summer
Fall
Winter
Total
Clear of Cloudy
626,986
799,443
519,487
312,036
2,257,952
Misty
288,096
250,679
302,510
155,573
996,858
Rain or Snow
3,507
11,007
19,616
3,739
37,869
Total
918,589
1,061,129
841,613
471,348
3,292,679
43
EXAMPLE
What is the probability a random customer uses the bike service in Fall or it was raining or snowing?
44
SUMMARY
CONDITIONAL PROBABILITIES
RANDOMNESS IN DATA
CONDITIONAL PROBABILITIES
The multiplication law provides a way to compute the probability of the intersection of two events as long as you know the conditional probabilities:
MULTIPLICATION LAW
OR
If the probability of an event A is not changed by the existence of event B, then the two events are called independent.
INDEPENDENT EVENTS
OR
The multiplication law provides a way to compute the probability of the intersection of two events as long as you know the conditional probabilities:
MULTIPLICATION LAW – INDEPENDENT EVENTS
ONLY IF EVENTS ARE INDEPENDENT
TREE DIAGRAMS
Tree diagrams can help calculate probabilities in a series of independent events.
For example, you have a 2-step random process where you flip a coin twice (independent flips):
Flip Coin
Heads – Flip
Tails
Tails – Flip
Tails
Heads
Heads
TREE DIAGRAMS
Tree diagrams can help calculate probabilities in a series of independent events.
For example, you have a 2-step random process where you flip a coin twice (independent flips):
Flip Coin
Heads – Flip
Tails
Tails – Flip
Tails
Heads
TREE DIAGRAMS
Tree diagrams can help calculate probabilities in a series of independent events.
For example, you have a 2-step random process where you flip a coin twice (independent flips):
Flip Coin
Heads – Flip
Tails – Flip
EXAMPLE
Weather
Spring
Summer
Fall
Winter
Total
Clear of Cloudy
626,986
799,443
519,487
312,036
2,257,952
Misty
288,096
250,679
302,510
155,573
996,858
Rain or Snow
3,507
11,007
19,616
3,739
37,869
Total
918,589
1,061,129
841,613
471,348
3,292,679
54
EXAMPLE
What is the probability a random customer uses the bike service in on a misty day given it is winter?
55
EXAMPLE
Weather
Spring
Summer
Fall
Winter
Total
Clear of Cloudy
626,986
799,443
519,487
312,036
2,257,952
Misty
288,096
250,679
302,510
155,573
996,858
Rain or Snow
3,507
11,007
19,616
3,739
37,869
Total
918,589
1,061,129
841,613
471,348
3,292,679
56
EXAMPLE
What is the probability a random customer uses the bike service in on a misty day given it is winter?
57
EXAMPLE
What is the probability a random customer uses the bike service in on a misty day given it is winter?
OR
58
The probability of an event given that another event has occurred is called a conditional (or joint) probability.
The multiplication law provides a way to compute the probability of the intersection of two events as long as you know the conditional probabilities.
If the probability of an event A is not changed by the existence of event B, then the two events are called independent.
SUMMARY