Slides: Distributions of Continuous Data
DISTRIBUTIONS OF CONTINUOUS DATA
ST101 – DR. ARIC LABARR
A random variable is a numerical description of the outcome of an experiment.
They can be either discrete or continuous.
A discrete random variable may assume either a finite number of values or an infinite sequence of values.
A continuous random variable may assume any numerical value in an interval or collection of intervals.
RANDOM VARIABLES
CONTINUOUS RANDOM VARIABLES
A continuous random variable can assume any value in an interval on the real line or in a collection of intervals on the real line.
It is not possible to talk about the probability of the random variable assuming a particular value.
Instead, we talk about the probability of the random variable assuming a value inside of a given interval.
PROBABILITIES ON INTERVALS
POPULAR CONTINUOUS DISTRIBUTIONS
Uniform
Exponential
Normal
A continuous random variable may assume any numerical value in an interval or collection of intervals.
It is not possible to talk about the probability of the random variable assuming a particular value, but we instead talk about probabilities of intervals.
SUMMARY
UNIFORM DISTRIBUTION
DISTRIBUTIONS OF CONTINUOUS DATA
A random variable follows a uniform distribution whenever the probability is proportional to the interval’s length.
In other words, every value has an equal probability of happening.
The probability density function for the uniform distribution is:
UNIFORM PROBABILITY DISTRIBUTION
Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.
The years of experience ranges from 2-12.
EXAMPLE OF UNIFORM DISTRIBUTION
Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.
EXAMPLE OF UNIFORM DISTRIBUTION
Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.
What is the probability a call is answered by an employee with 10 to 12 years of experience?
EXAMPLE OF UNIFORM DISTRIBUTION
What is the probability a call is answered by an employee with 10 to 12 years of experience?
Area under the curve between 10 and 12.
EXAMPLE OF UNIFORM DISTRIBUTION
Expected Value:
Variance:
MEASURES ON UNIFORM DISTRIBUTION
Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.
What is the expected years of experience of a person answering a new sales call?
EXAMPLE OF UNIFORM DISTRIBUTION
Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.
What is the expected years of experience of a person answering a new sales call?
EXAMPLE OF UNIFORM DISTRIBUTION
Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.
What is the expected years of experience of a person answering a new sales call?
EXAMPLE OF UNIFORM DISTRIBUTION
A random variable follows a uniform distribution whenever the probability is proportional to the interval’s length.
The probability density function for the uniform distribution is:
SUMMARY
NORMAL DISTRIBUTION
DISTRIBUTIONS OF CONTINUOUS DATA
The Normal probability distribution is one of the most common and important distributions for describing a continuous random variable.
The Normal distribution is the foundation of statistical inference:
Hypothesis Testing
Confidence Intervals
Regression Analysis
Appears in nature and real-world data.
IMPORTANCE
The probability density function for the Normal distribution is defined as:
PROBABILITY DENSITY FUNCTION
The probability density function for the Normal distribution is defined as:
PROBABILITY DENSITY FUNCTION
The probability density function for the Normal distribution is defined as:
PROBABILITY DENSITY FUNCTION
CHARACTERISTICS OF NORMAL DISTRIBUTION
CHARACTERISTICS OF NORMAL DISTRIBUTION
More Probable
Less Probable
CHARACTERISTICS OF NORMAL DISTRIBUTION
Mean can take ANY value
CHARACTERISTICS OF NORMAL DISTRIBUTION
Standard Deviation controls the width
The Normal probability distribution is one of the most common and important distributions for describing a continuous random variable.
The Normal distribution is the foundation of statistical inference.
The Normal distribution has some useful characteristics.
SUMMARY
EMPIRICAL RULE
DISTRIBUTIONS OF CONTINUOUS DATA
The probabilities for the Normal random variable are determined by the area under the curve.
The total area under the curve = 1.
Since the Normal distribution is perfectly symmetric around the mean (and median), then the area of the curve below the mean = above the mean = 0.5.
PROBABILITIES
EMPIRICAL RULE
EMPIRICAL RULE
EMPIRICAL RULE
EMPIRICAL RULE
EMPIRICAL RULE
EMPIRICAL RULE
Assume new employees at a company have previous years of professional experience that follow a Normal distribution where the mean is 7.5 and the standard deviation is 2.5.
What is the probability any random new employee has between 5 and 10 years of experience?
EXAMPLE
What is the probability any random new employee has between 5 and 10 years of experience?
EXAMPLE
What is the probability any random new employee has between 5 and 10 years of experience?
EXAMPLE
Assume new employees at a company have previous years of professional experience that follow a Normal distribution where the mean is 7.5 and the standard deviation is 2.5.
What is the probability any random new employee has between 2.5 and 10 years of experience?
EXAMPLE
What is the probability any random new employee has between 2.5 and 10 years of experience?
EXAMPLE
What is the probability any random new employee has between 2.5 and 10 years of experience?
EXAMPLE
The empirical rule (68, 95, 99.7 rule) is good for quick, fast, rough analysis.
Not good for exact analysis unless your interests are only in the integer standard deviations.
What about fractions of standard deviations away from the mean?
Need another way to quickly calculate area under the curve.
SUMMARY
STANDARD SCORES
DISTRIBUTIONS OF CONTINUOUS DATA
A random variable having a Normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard Normal probability distribution.
All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.
Standard Normal probability tables help calculate area under the curve.
CONVERSION OF NORMAL DISTRIBUTIONS
The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.
STANDARD NORMAL TABLE
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
0.5
.6915
.6950
.6985
.7019
.7054
.7088
.7123
.7517
.7190
.7224
0.6
.7257
.7291
.7324
.7357
.7389
.7422
.7454
.7486
.7517
.7549
0.7
.7580
.7611
.7642
.7673
.7704
.7734
.7764
.7794
.7823
.7852
0.8
.7881
.7910
.7939
.7967
.7995
.8023
.8051
.8078
.8106
.8133
0.9
.8159
.8186
.8212
.8238
.8264
.8289
.8315
.8340
.8365
.8389
.
.
.
.
.
.
.
.
.
.
.
The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.
STANDARD NORMAL TABLE
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
0.5
.6915
.6950
.6985
.7019
.7054
.7088
.7123
.7517
.7190
.7224
0.6
.7257
.7291
.7324
.7357
.7389
.7422
.7454
.7486
.7517
.7549
0.7
.7580
.7611
.7642
.7673
.7704
.7734
.7764
.7794
.7823
.7852
0.8
.7881
.7910
.7939
.7967
.7995
.8023
.8051
.8078
.8106
.8133
0.9
.8159
.8186
.8212
.8238
.8264
.8289
.8315
.8340
.8365
.8389
.
.
.
.
.
.
.
.
.
.
.
The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.
STANDARD NORMAL TABLE
The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.
To calculate values to the right of any point, use the laws of probability:
CALCULATING OPPOSITE PROBABILITIES
All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.
CONVERSION OF NORMAL DISTRIBUTIONS
All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.
CONVERSION OF NORMAL DISTRIBUTIONS
All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.
CONVERSION OF NORMAL DISTRIBUTIONS
All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.
CONVERSION OF NORMAL DISTRIBUTIONS
Z-SCORES
Assume that the daily number of total users follows a Normal distribution. The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that any random day has more than 6,000 total users?
Z-SCORES BIKE DATA EXAMPLE
Assume that the daily number of total users follows a Normal distribution. The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that any random day has more than 6,000 total users?
Z-SCORES BIKE DATA EXAMPLE
Z-SCORES BIKE DATA EXAMPLE
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
0.5
.6915
.6950
.6985
.7019
.7054
.7088
.7123
.7517
.7190
.7224
0.6
.7257
.7291
.7324
.7357
.7389
.7422
.7454
.7486
.7517
.7549
0.7
.7580
.7611
.7642
.7673
.7704
.7734
.7764
.7794
.7823
.7852
0.8
.7881
.7910
.7939
.7967
.7995
.8023
.8051
.8078
.8106
.8133
0.9
.8159
.8186
.8212
.8238
.8264
.8289
.8315
.8340
.8365
.8389
.
.
.
.
.
.
.
.
.
.
.
Assume that the daily number of total users follows a Normal distribution. The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that any random day has more than 6,000 total users?
Z-SCORES BIKE DATA EXAMPLE
Z-SCORES BIKE DATA EXAMPLE
Z-SCORES BIKE DATA EXAMPLE
Assume that the daily number of total users follows a Normal distribution. The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the number of daily users that would be in the bottom 10% of daily users?
Z-SCORES BIKE DATA EXAMPLE
Z-SCORES BIKE DATA EXAMPLE
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
-1.4
.0808
.0793
.0778
.0764
.0749
.0735
.0721
.0708
.0694
.0681
-1.3
.0968
.0951
.0934
.0918
.0901
.0885
.0869
.0853
.0838
.0823
-1.2
.1151
.1131
.1112
.1093
.1075
.1056
.1038
.1020
.1003
.0985
-1.1
.1357
.1335
.1314
.1292
.1271
.1251
.1230
.1210
.1190
.1170
-1.0
.1587
.1562
.1539
.1515
.1492
.1469
.1446
.1423
.1401
.1379
.
.
.
.
.
.
.
.
.
.
.
Assume that the daily number of total users follows a Normal distribution. The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the number of daily users that would be in the bottom 10% of daily users?
Z-SCORES BIKE DATA EXAMPLE
A random variable having a Normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard Normal probability distribution.
All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.
Standard Normal probability tables help calculate area under the curve.
SUMMARY