DISTRIBUTIONS OF STATISTICS FROM DATA

ST101 – DR. ARIC LABARR


REVIEW

Population

Sample

Statistic

Parameter

Population – set of all objects/individuals of interest.

Sample – subset of the population that information is actually obtained.

Statistic – measures computed from a sample.

Parameter – measures computed from a population.



2


PARAMETERS VS. STATISTICS

Population

Sample

Statistic

Parameter

Population – set of all objects/individuals of interest.

Sample – subset of the population that information is actually obtained.

Statistic – measures computed from a sample.

Parameter – measures computed from a population.



3


POINT ESTIMATORS

Point Estimator (Statistic)

Population Parameter

Sample statistics are point estimates (single number estimates) of a population parameter.

Different population parameters have different corresponding sample statistics.


Samples are estimates of the population.

Statistics are estimates of the parameters.

With any estimation, comes a chance of making errors.

SAMPLES ARE ESTIMATES


SAMPLES VS. POPULATIONS

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 


SAMPLES VS. POPULATIONS

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 

Sample 1:   1,   10,   6,   9

 


SAMPLES VS. POPULATIONS

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 

Sample 1:   1,   10,   6,   9

 

Sample 2:   1,   3,   2,   5

 


SAMPLING ERROR

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 

Sample 1:   1,   10,   6,   9

 

Sample 2:   1,   3,   2,   5

 

Both estimates are wrong!


Samples are estimates of the population.

Statistics are estimates of the parameters.

With any estimation, comes a chance of making errors.

Sampling error occurs when there is a difference between a sample point estimate and the corresponding population parameter.

SAMPLING ERROR


SAMPLING ERROR

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 

Sample 1:   1,   10,   6,   9

 

Sample 2:   1,   3,   2,   5

 

 

 


SAMPLING ERROR

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 

Sample 1:   1,   10,   6,   9

 

Sample 2:   1,   3,   2,   5

 

 

 

Sampling Error!


SAMPLING ERROR

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 

Sample 1:   1,   10,   6,   9

 

Sample 2:   1,   3,   2,   5

 

 

 

Typically all we know! Rarely have the parameter to measure sampling error.


SAMPLING ERROR

Population:   1,   3,   5,   5,   7,   9,   4,   6,   10,   2

 

Sample 1:   1,   10,   6,   9

 

Sample 2:   1,   3,   2,   5

 

 

 

If sample statistics (like the sample mean) had a predictable pattern, 

then the errors would have a typical pattern as well!


Sample statistics are point estimates (single number estimates) of a population parameter.

Sampling error occurs when there is a difference between a sample point estimate and the corresponding population parameter.

If sample statistics (like the sample mean) had a predictable pattern, then the errors would have a typical pattern as well.


SUMMARY


 

DISTRIBUTIONS OF STATISTICS FROM DATA


POINT ESTIMATORS

Point Estimator (Statistic)

Population Parameter

Sample statistics are point estimates (single number estimates) of a population parameter.

Different population parameters have different corresponding sample statistics.


 

SAMPLING DISTRIBUTION


 

SAMPLING DISTRIBUTION

 

 


 

SAMPLING DISTRIBUTION

 

 

 


MANY, MANY SAMPLES

Population: Normal

Mean: 0

S.D.: 1

 


MANY, MANY SAMPLES

Population: Normal

Mean: 0

S.D.: 1

Sample 1: -1.4,  0.2,  -1.7,  2.1,  -2.0,  0.5,  1.6,  -1.2,  0.6,  0.2

 

 


MANY, MANY SAMPLES

Population: Normal

Mean: 0

S.D.: 1

Sample 1: -1.4,  0.2,  -1.7,  2.1,  -2.0,  0.5,  1.6,  -1.2,  0.6,  0.2

 

Sample 2: 0.8,  -0.3,  -0.6,  -1.1,  -1.3,  0.4,  -0.9,  -0.4,  -1.0,  -1.2

 

 


MANY, MANY SAMPLES

Population: Normal

Mean: 0

S.D.: 1

Sample 1: -1.4,  0.2,  -1.7,  2.1,  -2.0,  0.5,  1.6,  -1.2,  0.6,  0.2

 

Sample 2: 0.8,  -0.3,  -0.6,  -1.1,  -1.3,  0.4,  -0.9,  -0.4,  -1.0,  -1.2

 

Sample 3: -0.2,  2.2,  0.7,  0.5,  1.2,  -0.1,  -0.6,  -0.6,  0.7,  -0.6

 

 


MANY, MANY SAMPLES

Population: Normal

Mean: 0

S.D.: 1

Sample 1: -1.4,  0.2,  -1.7,  2.1,  -2.0,  0.5,  1.6,  -1.2,  0.6,  0.2

 

Sample 2: 0.8,  -0.3,  -0.6,  -1.1,  -1.3,  0.4,  -0.9,  -0.4,  -1.0,  -1.2

 

Sample 3: -0.2,  2.2,  0.7,  0.5,  1.2,  -0.1,  -0.6,  -0.6,  0.7,  -0.6

 

Sample 4: 2.0,  -1.2,  1.6,  0.6,  -0.8,  1.2,  0.8,  0.9,  0.5,  -1.2

 

 


MANY, MANY SAMPLES

Population: Normal

Mean: 0

S.D.: 1

Sample 1: -1.4,  0.2,  -1.7,  2.1,  -2.0,  0.5,  1.6,  -1.2,  0.6,  0.2

 

Sample 2: 0.8,  -0.3,  -0.6,  -1.1,  -1.3,  0.4,  -0.9,  -0.4,  -1.0,  -1.2

 

Sample 3: -0.2,  2.2,  0.7,  0.5,  1.2,  -0.1,  -0.6,  -0.6,  0.7,  -0.6

 

Sample 4: 2.0,  -1.2,  1.6,  0.6,  -0.8,  1.2,  0.8,  0.9,  0.5,  -1.2

 

 


DISTRIBUTION OF SAMPLE MEANS

Population: Normal

Mean: 0

S.D.: 1

Sample 1: -1.4,  0.2,  -1.7,  2.1,  -2.0,  0.5,  1.6,  -1.2,  0.6,  0.2

 

Sample 2: 0.8,  -0.3,  -0.6,  -1.1,  -1.3,  0.4,  -0.9,  -0.4,  -1.0,  -1.2

 

Sample 3: -0.2,  2.2,  0.7,  0.5,  1.2,  -0.1,  -0.6,  -0.6,  0.7,  -0.6

 

Sample 4: 2.0,  -1.2,  1.6,  0.6,  -0.8,  1.2,  0.8,  0.9,  0.5,  -1.2

 

What is the distribution

of the sample means?

 


DISTRIBUTION OF SAMPLE MEANS

Population: Normal

Mean: 0

S.D.: 1

Sample 1: -1.4,  0.2,  -1.7,  2.1,  -2.0,  0.5,  1.6,  -1.2,  0.6,  0.2

 

Sample 2: 0.8,  -0.3,  -0.6,  -1.1,  -1.3,  0.4,  -0.9,  -0.4,  -1.0,  -1.2

 

Sample 3: -0.2,  2.2,  0.7,  0.5,  1.2,  -0.1,  -0.6,  -0.6,  0.7,  -0.6

 

Sample 4: 2.0,  -1.2,  1.6,  0.6,  -0.8,  1.2,  0.8,  0.9,  0.5,  -1.2

 

What is the distribution

of the sample means?

 

 

 


MANY, MANY SAMPLES

Population: Uniform

Mean: 0

S.D.: 1

 

 

 


MANY, MANY SAMPLES

Population: Uniform

Mean: 0

S.D.: 1

 

 

 

Sample 1: 0.7,  -0.8,  -0.2,  0.1,  -0.6,  1.5,  1.6,  -0.7,  0.7,  0.4

 


MANY, MANY SAMPLES

Population: Uniform

Mean: 0

S.D.: 1

 

 

 

Sample 1: 0.7,  -0.8,  -0.2,  0.1,  -0.6,  1.5,  1.6,  -0.7,  0.7,  0.4

 

Sample 2: -1.0,  -0.5,  0.1,  -1.2,  0.1,  1.7,  1.5,  1.1,  -1.7,  -0.8

 


MANY, MANY SAMPLES

Population: Uniform

Mean: 0

S.D.: 1

 

 

 

Sample 1: 0.7,  -0.8,  -0.2,  0.1,  -0.6,  1.5,  1.6,  -0.7,  0.7,  0.4

 

Sample 2: -1.0,  -0.5,  0.1,  -1.2,  0.1,  1.7,  1.5,  1.1,  -1.7,  -0.8

 

Sample 3: -0.9,  -1.7,  0.2,  0.1,  1.3,  -1.4,  -1.2,  0.3,  -0.1,  1.5

 


MANY, MANY SAMPLES

Population: Uniform

Mean: 0

S.D.: 1

 

 

 

Sample 1: 0.7,  -0.8,  -0.2,  0.1,  -0.6,  1.5,  1.6,  -0.7,  0.7,  0.4

 

Sample 2: -1.0,  -0.5,  0.1,  -1.2,  0.1,  1.7,  1.5,  1.1,  -1.7,  -0.8

 

Sample 3: -0.9,  -1.7,  0.2,  0.1,  1.3,  -1.4,  -1.2,  0.3,  -0.1,  1.5

 

Sample 4: -0.6,  -0.2,  0.8,  0.8,  -0.7,  -0.6,  1.6,  -0.6,  0.6,  -0.1

 


DISTRIBUTION OF SAMPLE MEANS

Population: Uniform

Mean: 0

S.D.: 1

 

 

 

Sample 1: 0.7,  -0.8,  -0.2,  0.1,  -0.6,  1.5,  1.6,  -0.7,  0.7,  0.4

 

Sample 2: -1.0,  -0.5,  0.1,  -1.2,  0.1,  1.7,  1.5,  1.1,  -1.7,  -0.8

 

Sample 3: -0.9,  -1.7,  0.2,  0.1,  1.3,  -1.4,  -1.2,  0.3,  -0.1,  1.5

 

Sample 4: -0.6,  -0.2,  0.8,  0.8,  -0.7,  -0.6,  1.6,  -0.6,  0.6,  -0.1

 

What is the distribution

of the sample means?


DISTRIBUTION OF SAMPLE MEANS

Population: Uniform

Mean: 0

S.D.: 1

 

 

 

Sample 1: 0.7,  -0.8,  -0.2,  0.1,  -0.6,  1.5,  1.6,  -0.7,  0.7,  0.4

 

Sample 2: -1.0,  -0.5,  0.1,  -1.2,  0.1,  1.7,  1.5,  1.1,  -1.7,  -0.8

 

Sample 3: -0.9,  -1.7,  0.2,  0.1,  1.3,  -1.4,  -1.2,  0.3,  -0.1,  1.5

 

Sample 4: -0.6,  -0.2,  0.8,  0.8,  -0.7,  -0.6,  1.6,  -0.6,  0.6,  -0.1

 

What is the distribution

of the sample means?

 

 


 

CENTRAL LIMIT THEOREM


 

CENTRAL LIMIT THEOREM


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE


Based on our previous example, all of the possible sample means (from samples of size 50) would have the following distribution:


 

 

 


Based on our previous example, all of the possible sample means (from samples of size 50) would have the following distribution:


 

 

 

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 

 

 

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 

 

 

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 

 

 

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 

 

 


The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 

 

 

 

 


 

SAMPLE SIZE AND SAMPLING DISTRIBUTION

 


SAMPLE SIZE AND SAMPLING DISTRIBUTION

 

 

 


 

SUMMARY


 

DISTRIBUTIONS OF STATISTICS FROM DATA


 

PROPORTIONS


 

PROPORTIONS


Sample proportions are similar to sample means.


 

Customer ID

Gender

Gender Numeric

001

M

0

002

F

1

003

F

1

004

M

0

005

M

0

 


Sample proportions are similar to sample means.


 

Customer ID

Gender

Gender Numeric

001

M

0

002

F

1

003

F

1

004

M

0

005

M

0

 

 

 


 

 

 

 


 

 

 

 

At least 5 in each of the two categories!


How large is large enough?




For values of p near 0.5, sample sizes as small as 10 can afford a Normal approximation.

With very small (approaching 0) or large (approaching 1) values of p, much larger samples are needed.


 

 

 


 

SAMPLING DISTRIBUTION

 

 


You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE


You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?


SAMPLING DISTRIBUTION BIKE DATA EXAMPLE

 

 


You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?


 

 


You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?


 

 


You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?



You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy? 

SUMMARY


Modifié le: lundi 17 octobre 2022, 13:21