What  Customers
Are Saying  About  the
Excel Statistical Master

"I bought Mark Harmon's Excel Master Series manual as a reference for a graduate course on statistics that I was taking as part of an MBA program at the University of Delaware. I purchased the materials about halfway through the course and wish I had known about this manual from the start of the class!

Mark has done a great job in writing complex statistical concepts in an easy to understand format that makes grasping them both easy to understand and to use.

With the help of Mark's book, and some diligent studying, I received an A in my stats course. Thanks Mark! "

Chris Veale
Newark, Delaware

"I am a medical student at Semmelweis University and the Excel Statistical Master helped me so much with passing my midterms and my semifinal exam. There is no way I would have passed without it.

Even though I went to all of the classes and consultations, it was the Excel Statistical Master that taught me all of the basic concepts for the different tests we used.

Each test is explained in different steps and how you performed it on Excel. Illustrations and screenshots make it easy to follow, even for those like me that never had used Excel before.

I highly recommend Excel Statistical Master for all medical students. It's worth every dollar. And I have to say that the communication with the seller have been the best! If I had questions about statistics problems, he more than gladly answered them. It's so easy and saved my from hours with reading!

Thanks a lot!"

Annette Myhre
Medical Student
Semmelweis University
Budapest, Hungary

"After years of searching for a simplified statistics book, I found the Excel Statistical Master.

Unlike the indecipherable jargon in the countless books I have wasted money on, the language in this book is plain and easy to understand. This is the best $40 I have ever spent. "

Mahdi Raghfar
New York, New York
Excel
STATISTICAL
Master
 
Confidence Intervals
for the Graduate Student and Business Manager

Clear and Complete - WITH LOTS OF SOLVED PROBLEMS

The Confidence Interval

 Lots of Worked-Out, Easy-To-Understand, Graduate-Level Problems  --->
( Scroll Down and Take a Look ! )


Introduction

A Confidence Interval is an estimate of a population's average or proportion based upon sample data drawn from the population. A Confidence Interval is a range of values in which the mean is likely to fall with a specified level of confidence or certainty.


Table of Contents


Basic Explanation of Confidence Intervals
Mean Sampling vs. Proportion Sampling
Confidence Intervals of a Population Mean
Confidence Interval Calculation Using Large Samples
The Central Limit Theorem
Levels of Confidence and Significance
Population Mean vs. Sample Mean
Standard Deviation and Standard Error
Region of Certainty vs. Region of Uncertainty
Z Score
Formula for Calculating Confidence Interval Boundaries from Sample Data
Confidence Interval Desirable Properties of Data Sets
The 7 Most Common Correctable Causes of Sample Data Appearing Non-Normal

Problem 1: Calculate a Confidence Interval from a Random Sample of Test  Scores

Problem 2: Calculate a Confidence Interval of Daily Sales Based Upon Sample Mean and Standard Deviation

Problem 3: Calculate an Exact Range of 95% of Sales Based Upon the
Population Mean and Standard Deviation


Determine Minimum Sample Size to Limit Confidence Interval of Mean to a
Certain Width


Problem 4: Determine the Minimum Number of Sales Territories to Sample
In Order To Limit the 95% Confidence Interval to a Certain Width


Confidence Interval for a Population Proportion
Mean Sampling vs. Proportion Sampling
Levels of Confidence and Significance
Population Proportion vs. Sample Proportion
Standard Deviation and Standard Error
Region of Certainty vs. Region of Uncertainty
Z Score
Formula for Calculating Confidence Interval Boundaries from Sample Data

Problem 5: Determine Confidence Interval of Shoppers Who Prefer to Pay By
Credit Card Based Upon Sample Data


Determine Minimum Sample Size to Limit Confidence Interval of Proportion to a
Certain Width


Problem 6: Determine the Minimum Sample Size of Voters to be 95% Certain
that the Population Proportion is only 1% Different than Sample
Proportion

 


Click Here To See How To Create Confidence Intervals in Excel

Click Here To Send a Link To This Page To A Friend


Basic Explanation of Confidence Intervals


The Confidence Interval is an interval in which the true population mean or proportion probably lies based upon a much smaller random sample taken from that population.

Confidence Intervals for means are calculated differently than Confidence Intervals for proportions. The first half of this course module will discuss calculating a Confidence Interval for a population mean. The second half will cover calculating a Confidence Interval for a population proportion.


First we will briefly discuss the difference between sampling for mean and sampling for proportion:



Mean Sampling vs. Proportion Sampling


What determines whether a mean is being estimated or a proportion is being estimated is the number of possible outcomes of each sample taken.


Proportion samples have only two possible outcomes.
For example, if you are comparing the proportion of Republicans in two different cities, each sample has only two possible values; the person sampled either is a Republican or is not.


Mean samples have multiple possible outcomes.
For example, if you are comparing the mean age of people in two different cities, each sample can have numerous values; the person sampled could be anywhere from 1 to 110 years old.


Below is a description of how to calculate a Confidence Interval for a population's mean. Note that everything is almost the same as the calculation of the Confidence Interval for a population proportion, except sample standard error.




Confidence Interval of a Population Mean


The Confidence Interval of a Mean is an interval in which the true population mean probably lies based upon a much smaller random sample taken from that population.

A 95% Confidence Interval of a Mean is the interval that has a 95% chance of containing the true population mean.

The width of a Confidence Interval is affected by the sample size. The larger the sample size, the more accurate and tighter is the estimate of the true population mean. The larger the sample size, the smaller will be the Confidence Interval. Samples taken must be random and also be representative of the population.



Calculate Confidence Intervals Using Large Samples(n>30)

Confidence Intervals are usually calculated and plotted on a Normal curve. If the sample size is less then 30, the population must be known to be Normally distributed. If small-sample data (n<30) is used to plot the Confidence Interval of the Mean for a population that is not Normally distributed, the result can be totally wrong.

Probably the most common major mistake in statistics is to apply Normal or t distribution tests to small-sample data taken from a population of unknown distribution. Typically the actual distribution of a population is not known.

If the population's underlying distribution is not known (usually it is not), then only large samples (n>30) are valid for creating a Confidence Interval of the Mean. The most important theorem of statistics, the Central Limit Theorem, explains the reason for this.



The Central Limit Theorem


The Central Limit Theorem is statistics' most fundamental theorem. In a nutshell, it states the following: Random sample data can be plotted on a Normal curve to estimate a population's mean no matter how the population is distributed, as long as sample size is large (n>30).


The above definition of the Central Limit Theorem is the most practical and easy-to understand. The following definition of this theorem is a bit more technical and will satisfy statisticians (but basically says the same thing as the above): No matter how the population is distributed, the sampling distribution of the mean approaches the Normal curve as sample size becomes large.



Click Here To See How To Create Confidence Intervals in Excel

Levels of Confidence and Significance

Level of Significance, α ("alpha"), equals the maximum allowed percent of error. If the maximum allowed error is 5%, then α = 0.05.

Level of Confidence is the desired degree of certainty. A 95% Confidence Level is the most common. A 95% Confidence Level would correspond to a 95 Confidence Interval of the Mean. This would state that the actual population mean has a 95% probability of lying within the calculated interval. A 95% Confidence Level corresponds to a 5% Level of Significance, or α = 0.05. The Confidence Level therefore equals 1 - α.



Population Mean vs. Sample Mean


Population Mean
= µ ("mu") (This is what we are trying to estimate)

Sample Mean = xavg



Standard Deviation and Standard Error


Standard Deviation is a measure of statistical dispersion. Standard Deviation equals the square root of the Variance. It's formula is the following:

SQRT ( [ SUM (x - xavg)2 ] / N )


Population Standard Deviation = σ ("sigma")

Sample Standard Deviation = s

Standard Error is an estimate of population Standard Deviation from data taken from a sample. If the population Standard Deviation, σ, is known, then the Sample Standard Error, sxavg, can be calculated. If only the Sample Standard Deviation, s, is known, then Sample Standard Error, sxavg, can be estimated by substituting Sample Standard Deviation, s, for Population Standard Deviation, σ, as follows:

Sample Standard Error = sxavg = σ / SQRT(n) s / SQRT(n)

σ = Population standard deviation
s = Sample standard deviation
n = sample size



Region of Certainty vs. Region of Uncertainty


Region of Certainty
is the area under the Normal curve that corresponds to the required Level of Confidence. If a 95% percent Level of Confidence is required, then the Region of Certainty will contain 95% of the area under the Normal curve. The outer boundaries of the Region of Certainty will be the outer boundaries of the Confidence Interval.

The Region of Certainty, and therefore the Confidence Interval, will be centered about the mean. Half of the Confidence Interval is on one side of the mean and half on the other side.


Region of Uncertainty is the area under the Normal curve that is outside of the Region of Certainty. Half of the Region of Uncertainty will exist in the right outer tail of the Normal curve and the other half in the left outer tail. This is similar to the concept of the "two-tailed test" that is used in Hypothesis testing in further sections of this course. The concepts of one- and two-tailed testing are not used when calculating Confidence Intervals. Just remember that the Region of Certainty, and therefore the Confidence Interval, are always centered about the mean on the Normal curve.


Relationship Between Region of Certainty, Uncertainty, and Alpha - The Region of Uncertainty corresponds to a ("alpha"). If a = 0.05, then that Region of Uncertainty contains 5% of the area under the Normal curve. Half of that area (2.5%) is in each outer tail. The 95% area centered about the mean will be the Region of Certainty. The outer boundaries of this Region of Certainty will be the outer boundaries of the 95% Confidence Interval. The Level of Confidence is 95% and the Level of Significance, or maximum error allowed, is 5%.

Click Here To See How To Create Confidence Intervals in Excel


The 95% Confidence Interval

Illustrated With a Two-Tailed Normal Curve

Confidence Interval - Normal Distribution Curve - Two-Tailed - 95% Confidence Interval

 

Z Score Chart

Z Score at x (Inner Numbers - Yellow)
vs.
Area Under Normal Curve
Between Mean (µ) and x  (Outer Numbers - Green) 

Confidence Interval Z Score, 2-Tailed, 95% Confidence Interval

 

The 95% Confidence Interval

Illustrated With a One-Tailed Normal Curve –

Right Tail

Confidence Interval - Normal Distribution Curve - One-Tailed Right - 95% Confidence Interval

 

Z Score Chart

Z Score at x (Inner Numbers - Yellow)
vs.
Area Under Normal Curve
Between Mean (µ) and x  (Outer Numbers - Green) 

Confidence Interval Z Score, 1-Tailed, 95% Confidence Interval


The 95% Confidence Interval

Illustrated With a One-Tailed Normal Curve –

Left Tail

Confidence Interval - Normal Distribution Curve - One-Tailed Left - 95% Confidence Interval

 

Z Score Chart

Z Score at x (Inner Numbers - Yellow)
vs.
Area Under Normal Curve
Between Mean (µ) and x  (Outer Numbers - Green) 

Confidence Interval Z Score, 1-Tailed, 95% Confidence Interval



Z Score


Z Score is the number of Standard Errors from the mean to outer right boundary of the Region of Certainty (and therefore to the outer right boundary of the Confidence Interval). Standard Errors are used and not Standard Deviations because sample data is being used to calculate the Confidence Interval.

It is very important to note that on a Standardized Normal Curve, the Distance from the mean to boundary of the Region of Certainty equals the number of standard errors from the mean to boundary, which is the Z Score.

The above is only true for a Standardized Normal Curve. It is NOT true for a Non Standardized Normal curve.


Z Score(1 - α) = Number of Standard errors from mean to boundary of Confidence Interval. Note that (1 - α/2) = the entire area in the Normal curve to the left of outer right boundary of the Region of Certainty, or Confidence Interval. This includes the entire Region of Certainty and the half of the Region of Uncertainty that exists in the left tail.

For example:

Level of Confidence = 95% for a 95% Confidence Interval
Level of Significance (α)= 5%
Two-Tailed Curve

1 - α = 0.95 = 95%

Z Score(95% 2-Tailed) = 1.96

The outer right boundary of the 95% Confidence Interval, and the Region of Certainty, is 1.96 Standard Errors from the mean. The left boundary is the same distance from the mean because the Confidence Interval is centered about the mean.

Confidence Interval - Normal Distribution Curve - Two-Tailed, 95% Confidence Interval

 

Z Score = 1.96

Z Score Chart

Z Score at x (Inner Numbers - Yellow)
vs.
Area Under Normal Curve
Between Mean (µ) and x  (Outer Numbers - Green) 

Confidence Interval - Z Score Chart - 2-Tailed, 95% Confidence Interval

The above table indicates that 95% of the area under the Normal curve lies within 1.96 Standard Errors of the mean if a two-tailed test is used. This indicates that 2.5% of the total area lies outside 1.96 Standard Errors of the mean on either side of the mean if a two-tailed test is used.

 

Click Here To See How To Do Hypothesis Tests in Excel


Formulas for Calculating Confidence Interval

Boundaries from Sample Data

The Confidence Interval is the width of the Region of Certainty. The Region of Certainty extends the same distance from the mean to the left and to the right because the Normal curve is symmetrical about the mean.

The Confidence Interval extends to the left and to the right of the mean by the following distance:

Z Score(1-α) * Sample Standard Error

which equals

Z Score(1-α) * sxavg

Therefore: 

Confidence Interval Boundaries =
= Sample mean +/- Z Score(1-α) * Sample Standard Error

Confidence Interval Boundaries = xavg +/- Z Score(1-α) * sxavg

Confidence Interval Boundaries = xavg +/- Z Score(1-α) * σ / SQRT(n)

Confidence Interval Boundaries xavg +/- Z Score(1-α)) * s / SQRT(n)


As a result of:

 
Sample Mean = xavg

Sample Standard Deviation = s

Sample Standard Error = sxavg = σ / SQRT(n) s / SQRT(n)

Sample Size = n


For example, determine the 95% Confidence Interval if the Sample Size is 30, the Sample Standard Deviation = 3, the sample average  = 0, and the population from which the sample is drawn is Normally Distributed.

α - 1 - 95% =  1 - 0.95 = 0.05
Sample Standard Deviation
= s = 3
Sample Size = n = 30
Sample Mean = xavg = 0

Sample Standard Error = sxavg = σ / SQRT(n) s / SQRT(n) = 3 / SQRT(30)

Sample Standard Error
  3 / SQRT(30)  =  0.5477

Z Score(1-α) = Z Score(95%, 2-tailed) = 1.96

Confidence Interval Boundaries = xavg +/- Z Score(1-α) * sxavg

Confidence Interval Boundaries xavg +/- Z Score(1-α) * s / SQRT(n)

Confidence Interval Boundaries
0 +/- (1.96) * (0.5477) =  0 +/- 1.074

 

Confidence Interval - 2-Tailed, 95% Confidence Interval

 

 

Problem 1

Problem 1: Calculate a Confidence Interval from a Random Sample of Test Scores

Problem: Given the following set of 32 random test scores taken from a much larger population, calculate with 95% certainty an interval in which the population mean test score must fall. In other words, calculate the 95% Confidence Interval for the population test score mean. The random sample of 32 tests scores is shown below.

32 Random Test Score Samples from a Much Larger Population

220 300
370 410
500 540
640 660
220 300
370 410
500 540
640 660
220 300
370 410
500 540
640 660
220 300
370 410
500 540
640 660


Level of Confidence = 95% = 1 - α

Level of Significance = α = 0.05

Sample Size = n = 32

Sample Standard Deviation = s = 149.5

Sample Standard Error = sxavg = σ / SQRT(n) s / SQRT(n) = 149.8 / SQRT(32)

Sample Standard Error
  149.8 / SQRT(32)  =  26.5

Z Score(1-α) = Z Score(95%, 2-tailed) = 1.96

Confidence Interval Boundaries = xavg +/- Z Score(1-α) * sxavg

Confidence Interval Boundaries xavg +/- Z Score(1-α) * s / SQRT(n)

Confidence Interval Boundaries
455 +/- (1.96) * (26.5) =  455 +/- 51.9

Confidence Interval Boundaries 403.1  to  506.9

 

Confidence Interval - Problem 1 Graph 1 - 2-Tailed, 95% Confidence Interval

 

Z Score Chart

Z Score at x (Inner Numbers - Yellow)
vs.
Area Under Normal Curve
Between Mean (µ) and x  (Outer Numbers - Green) 

Confidence Interval - Problem 1 Chart 1, Z Score, 95% Confidence Interval

This same problem above is solved in the Excel Statistical Master with only 3 Excel formulas (and NO looking anything up on a Z Chart). Everything is explained to you in SIMPLE language in the Excel Statistical Master.

Click Here To See How To Create Confidence Intervals in Excel

 

Problem 2

Problem 2: Calculate a Confidence Interval of Daily Sales Based Upon Sample Mean and Standard Deviation

Problem: Average daily demand for books sold in a small Barnes and Noble store is 455 books with a standard deviation of 200. This average and standard deviation are taken from sale data collected every day for a period of 60 days. What is the range that the true average daily book sales lies in with 95% certainty?

Level of Confidence = 95% = 1 - α

Level of Significance = α = 0.05

Sample Size = n = 60

Sample Mean = xavg = 455

Sample Standard Deviation = s = 200

Sample Standard Error = sxavg = σ / SQRT(n) s / SQRT(n)

sxavg = σ / SQRT(n) s / SQRT(n) = 200 / SQRT(60) = 25.8

Z Score(1 - α) = Z Score(95% 2-Tailed) = 1.96

Width of Half the Confidence Interval = Z Score(1-α) * sxavg

Width of Half the Confidence Interval = Z Score(1-α) * σ / SQRT(n)

Width of Half the Confidence Interval xavg +/- Z Score(1-α) * s / SQRT(n)

= 1.96 * 25.8 = 50.6

Confidence Interval Boundaries = xavg +/- Z Score(1-α) * sxavg

455 +/- (1.96)*(25.8) = 455 +/- 50.6 = 404.4 to 505.6

  

Confidence Interval  - Problem 2 Graph 1, 2-Tailed, 95% Confidence Interval

This same problem above is solved in the Excel Statistical Master with only 3 Excel formulas (and not having to look anything up on a Z Chart). The Excel Statistical Master teaches you everything in step-by-step frameworks. You'll never have to memorize any complicated statisical theory.

  

Problem 3

Problem 3: Calculate an Exact Range of 95% of Sales Based Upon the Upon the Population Mean and Standard Deviation


Problem: Average daily demand for books sold in a large Barnes and Noble store is 5,000 books with a standard deviation of 200. This average and standard deviation are taken from sale data collected every day for a period of 5 years. What is the range that 95% of the daily unit book sales falls in? The daily sales data is Normally distributed.


This problem is not a Confidence Interval problem. We do not need to create an estimate of the population mean (a Confidence Interval) because we know exactly what it is. We are given the population mean and population standard deviation.

We do need to know how the population is distributed in order to calculate the interval that contains 95% of all population data. Given that the population is Normally distributed, we simply need to map the region of this Normal curve that contains 95% of the total area and is centered about the mean as follows:

Population mean = µ = 5,000

Population Standard Deviation = σ = 200


Range Containing 95% of Sales Data = µ +/- [ Z Score(95% 2-Tailed) ]  * σ  ]

= 5,000 +/- [ 1.96 * 200 ]

= 5,000 +/- 392

= 4,608 to 5,392

 

Confidence Interval - Problem 3 Graph 1 - 2-Tailed 95% Confidence Interval

Z Score Chart

Z Score at x (Inner Numbers - Yellow)
vs.
Area Under Normal Curve
Between Mean (µ) and x  (Outer Numbers - Green) 

Confidence Interval - Problem 3 Chart 1 - Z Score - 2-Tailed, 95% Confidence Interval

This same problem above is solved in the Excel Statistical Master with only 1 Quick Excel formula (and NO looking anything up on a Z Chart). You'll be able to grasp your statistics course a LOT easier with the Excel Statistical Master.

Click Here To See How To Create Confidence Intervals in Excel



Determining Minimum Sample Size (n) to Keep

Confidence Interval of the Mean within a Certain

Tolerance

The larger the sample size, the more accurate and tighter will be the prediction of a population's mean. Stated another way, the larger the sample size, the smaller will be the Confidence Interval of the population's mean. Width of the Confidence Interval is reduced when sample size is increased.

Quite often a population's mean needs to be estimated with some level of certainty to within plus or minus a specified tolerance. This specified tolerance is half the width of the Confidence Interval. Sample size directly affects the width of the Confidence Interval. The relationship between sample size and width of the Confidence Interval is shown as follows:

n = Min Sample Size Required to Keep Confidence Interval Within a
           Required Tolerance for a Given Confidence Level and
           Standard Deviation

Width of Half of the Confidence Interval = Z Score(1-α) * σ / SQRT(n)

Width of Half of the Confidence Interval Z Score(1-α) * sxavg

As a result of:

Z Score(1-α) * σ / SQRT(n)      Z Score(1-α) * s / SQRT(n)


With algebraic manipulation of the above we have:


SQRT(n) = Z Score(1-α) * σ / Width of Half of the Confidence Interval

n = [ Z Score(1-α) ]2 * [σ]2 / [ Width of Half of the Confidence Interval ]2


Also, if we only have Sample Standard Deviation, s, and not Population
Standard Deviation, σ :

n [ Z Score(1-α) ]2 * [s]2 / [ Width of Half of the Confidence Interval ]2

 

Problem 4

Problem 4: Determine the Minimum Number of Sales Territories to Sample In Order To Limit the 95% Confidence Interval to a Certain Width

Problem: A national sales manager in charge of 5,000 similar territories ran a nationwide promotion. He then collected sales data from a random sample of the  territories to evaluate sales increase. From the sample, the average sales increase per territory was $10,000 with a standard deviation of $500. How many territories would he have had to have sampled to be 95% sure that the actual nationwide average territory sales increase was no more than $50 different than average territory sales increase from the sample he took?



Level of Confidence = 95% = 1 - α

Level of Significance = α = 0.05

Sample Size = n = ?

n = Min Sample Size Required to Keep Confidence Interval Within a
           Required Tolerance for a Given Confidence Level and
           Standard Deviation


Sample Mean = xavg ------> Note this does not need to be known to solve this problem

Sample Standard Deviation = s = 500

Z Score(1 - α) = Z Score(95% 2-Tailed) = 1.96


Width of Half the Confidence Interval = 50

n [ Z Score(1-a) ]2 * [s]2 / [ Width of Half of the Confidence Interval ]2

n [ 1.96 ]2 * [500]2 / [ 50 ]2 = 384

The sales manager would have to sample at least 384 territories to be 95% certain that nationwide territory average was within +/- $50 of the sample territory average. Note that the 95% confidence interval is $10,000 +/- $50 and this interval has a width = $100 if sample size is 384.

This same problem above is solved in the Excel Statistical Master with only 2 Excel formulas (and not having to look anything up on a Z Chart). If you found your statistics book confusing, You'll really like the Excel Statistical Master. Everything is explained in simple, step-by-step frameworks.

 

 

Confidence Interval of a Population Proportion


Creating a Confidence Interval for a population's proportion is very similar to creating a Confidence Interval for a population's mean. The only real difference is how the standard error is calculated. Everything else is the same. The method of calculating a Confidence Interval for a population mean was covered in  detail earlier in this module. First, the difference between using sampling to estimate a population mean and using sampling to estimate a population proportion will be explained below:



Mean Sampling vs. Proportion Sampling


What determines whether a mean is being estimated or a proportion is being estimated is the number of possible outcomes of each sample taken.

Proportion samples have only two possible outcomes. For example, if you are comparing the proportion of Republicans in two different cities, each sample has only two possible values; the person sampled either is a Republican or is not.

Mean samples have multiple possible outcomes. For example, if you are comparing the mean age of people in two different cities, each sample can have numerous values; the person sampled could be anywhere from 1 to 110 years old.


Below is a description of how to calculate a Confidence Interval for a population's proportion. Note that everything is almost the same as the calculation of the Confidence Interval for a mean, except sample standard error.




Levels of Confidence and Significance

Level of Significance, α ("alpha"), equals the maximum allowed percent of error. If the maximum allowed error is 5%, then α = 0.05.

Level of Confidence is selected by the user. A 95% Level is the most common. A 95% Confidence Level would correspond to a 95% Confidence Interval of the Proportion. This would state that the actual population Proportion has a 95%probability of lying within the calculated interval. A 95% Confidence Level corresponds to a 5% Level of Significance, or α, = 0.05. The Confidence Level therefore equals 1 - α.

Population Proportion vs. Sample Proportion


Population Proportion = µp = p (This is what we are trying to estimate)

Sample Proportion = pavg

 

Standard Deviation and Standard Error

Standard Deviation is not calculated during the creation of Confidence Interval for a population proportion.

Standard Error is an estimate of population Standard Deviation from data taken from a sample. Sample Standard Error will be an estimate taken from the sample proportion, pavg, and sample size, n. This is the major difference between calculating a Confidence Interval for a proportion and for a mean. Binomial distribution rules apply to proportions because a proportion sample has only two possible outcomes, just like a binomial variable.


Sample Standard Error of a Proportion = σpavg
σpavg = SQRT(p * q / n) spavg

Estimated Sample Standard Error of a Proportion = spavg
spavg = SQRT ( pavg * qavg / n )

p = Population proportion - This is the unknown that will be estimated with a Confidence Interval

q = 1 - p

n = Sample Size

pavg = Sample Proportion

qavg = 1 - pavg


  

Region of Certainty vs. Region of Uncertainty


Region of Certainty is the area under the Normal curve that corresponds to the required Level of Confidence. If a 95% percent Level of Confidence is required, then the Region of Certainty will contain 95% of the area under the Normal curve. The outer boundaries of the Region of Certainty will be the outer boundaries of the Confidence Interval.

The Region of Certainty, and therefore the Confidence Interval, will be centered about the mean. Half of the Confidence Interval is on one side of the mean and  half on the other side.

Region of Uncertainty is the area under the Normal curve that is outside of the Region of Certainty. Half of the Region of Uncertainty will exist in the right outer tail of the Normal curve and the other half in the left outer tail. This is similar to the concept of the "two-tailed test" that is used in Hypothesis testing in further sections of this course. The concepts of one and two-tailed testing are not used when calculating Confidence Intervals. Just remember that the Region of Certainty, and therefore the Confidence Interval, are always centered about the mean on the Normal curve.


Relationship Between Region of Certainty, Uncertainty, and Alpha - The Region of Uncertainty corresponds to a ("alpha"). If α = 0.05, then that Region of Uncertainty contains 5%  of the area under the Normal curve. Half of that area (2.5%) is in each outer tail. The 95% area centered  about the mean will be the Region of Certainty. The outer boundaries of this Region of Certainty will be the outer boundaries of the 95% Confidence Interval. The Level of Confidence is 95% and the Level of Significance, or maximum error allowed, is 5%.

 

Z Score


Z Score is the number of Standard Errors from the mean to outer right boundary of the Region of Certainty (and therefore to the outer right boundary of the Confidence Interval). Standard Errors are used and not Standard Deviations because sample data is being used to calculate the Confidence Interval.

It is very important to note that on a Standardized Normal Curve, the Distance from the mean to boundary of the Region of Certainty equals the number of standard errors from the mean to boundary, which is the Z Score.

The above is only true for a Standardized Normal Curve. It is NOT true for a Non Standardized Normal curve.


Z Score(1 - α) = Number of Standard errors from mean to boundary of Confidence Interval. Note that (1 - α/2) = the entire area in the Normal curve to the left of outer right boundary of the Region of Certainty, or Confidence Interval. This includes the entire Region of Certainty and the half of the Region of Uncertainty that exists in the left tail.

For example:

Level of Confidence = 95% for a 95% Confidence Interval
Level of Significance (α)= 5%
Two-Tailed Curve

1 - α = 0.95 = 95%

Z Score(95% 2-Tailed) = 1.96

The outer right boundary of the 95% Confidence Interval, and the Region of Certainty, is 1.96 Standard Errors from the mean. The left boundary is the same distance from the mean because the Confidence Interval is centered about the mean.

Confidence Interval - Normal Distribution - 2-Tailed, 95% Confidence Interval

 

Z Score = 1.96

Z Score Chart

Z Score at x (Inner Numbers - Yellow)
vs.
Area Under Normal Curve
Between Mean (µ) and x  (Outer Numbers - Green) 

Confidence Interval - Z Score Explanation - 2-Tailed, 95% Confidence Interval

The above table indicates that 95% of the area under the Normal curve lies within 1.96 Standard Errors of the mean if a two-tailed test is used. This indicates that 5% of the total area lies outside 1.96 Standard Errors of the mean in either side of the mean if a two-tailed test is used.



Formula for Calculating Confidence Interval

Boundaries from Sample Data for a Population

Proportion

Confidence Interval Boundaries
= Sample proportion +/- Z Score(1-α) * Sample Standard Error


Confidence Interval Boundaries
= pavg +/- Z Score(1-α) * spavg

Sample Proportion = pavg

Sample Standard Error of a Proportion = σpavg spavg

Sample Standard Error of a Proportion = SQRT ( pavg * qavg / n )

Sample size = n


Confidence Interval Boundaries = pavg +/- Z Score(1-α) * spavg



Click Here To See How To Create Confidence Intervals in Excel


Problem 5

Problem 5: Determine Confidence Interval of Shoppers Who Prefer to Pay By Credit Card Based Upon Sample Data

Problem: A random sample of 1,000 shoppers was taken. 70% preferred to pay with a credit card. 30% preferred to pay with cash. Determine the 95% Confidence Interval for the proportion of the general population that prefers to pay with a credit card.


Level of Confidence = 95% = 1 - α

Level of Significance = α = 0.05

Sample Size = n = 1,000

Sample Proportion = pavg = 0.70

qavg = 1 - pavg = 0.30

Sample Standard Error of a Proportion = σpavg spavg
spavg = SQRT ( pavg * qavg / n )

spavg = SQRT ( 0.70 * 0.30 / 1,000 ) = 0.014

Z Score(1 - α) = Z Score95% = 1.96

Width of Half the Confidence Interval = Z Score(1-α) * spavg

= 1.96 * 0.014 = 0.0274

Confidence Interval Boundaries = pavg +/- Z Score(1-a) * spavg

= 0.70 +/- (1.96) * (0.014)

= 0.70 +/- 0.0274 = 0.6726 to 0.7274 = 67.26% to 72.74%

 

Confidence Interval - Problem 5 Graph 1 - 2-Tailed, 95% Confidence Interval

This same problem above is solved in the Excel Statistical Master with only 2 Excel formulas. With the Excel Statistical Master you can do advanced business statistics without having to buy and learn expensive, complicated statistical software packages such as SyStat, MiniTab, SPSS, or SAS.

 


Determining Minimum Sample Size (n) to Keep

Confidence Interval of the Proportion within a Certain

Tolerance

The larger the sample size, the more accurate and tighter will be the prediction of a population's mean. Stated another way, the larger the sample size, the smaller will be the Confidence Interval of the population's mean. Width of the Confidence Interval is reduced when sample size is increased.

Quite often a population's mean needs to be estimated with some level of certainty to within plus or minus a specified tolerance. This specified tolerance is half the width of the Confidence interval. Sample size directly affects the width of the Confidence Interval. The relationship between sample size and width of the Confidence Interval is shown as follows:


Width of Half the Confidence Interval =Z Score(1-α)* spavg

spavg = SQRT( pavg* qavg/ n)

Width of Half the Confidence Interval = Z Score(1-α)*SQRT( pavg* qavg/ n)

[ Width of Half the Confidence Interval ]2 =[Z Score(1-α) ]2*( pavg* qavg/ n)

n = [ Z Score(1-α)]2 * ( pavg * qavg ) / [ Width of Half the Confidence Interval ]2




Problem 6: Determine the Minimum Sample Size of Voters to be 95% Certain that the Population Proportion is no more than 1% Different from Sample Proportion.

Problem: A random survey was conducted in one city to learn voting preferences. 40% of voters surveyed said they would vote Republican. 60% of the voters surveyed said they would vote Democrat. Determine the minimum number of voters that had to be surveyed to be 95% certain that the results were accurate within
+/- 1%.



Level of Confidence = 95% = 1 - α

Level of Significance = α = 0.05

pavg = 0.40

qavg = 1 - pavg = 0.60

Width of Half the Confidence Interval = 0.01 ---> (1%)

Z Score(1 - a) = Z Score95% = 1.96

n = [ Z Score(1-α) ]2 * ( pavg * qavg ) / [ Width of Half the Confidence Interval ]2

n = [ 1.96 ]2 * ( 0.40 * 0.60 ) / [ 0.01 ]2 = 9,220


At least 9,220 random voters had to be surveyed to be 95% certain that the population proportion is no more than 1% different from the sample.

This same problem above is solved in the Excel Statistical Master with only 2 Excel formulas. The Excel Statistical Master is for you if you want to know how to apply statistics to solve real world business problems.

Click Here To See How To Create Confidence Intervals in Excel

 

If You Like This, Then Share It... Dig this Stumble upon Delicious Technorati Reddit Buzz it Twitthis

 

 

Statistics in Excel Home      Normal Distribution    

t Distribution     Binomial Distribution

Regression     Confidence Intervals     Combinations and Permutations

Correlation and Covariance     ANOVA     Other Useful Distributions

Statistical Training Videos     Statistics Blog     Statistics Jobs

Latest Manuals in the Excel Master Series

Contact Information     Statistics Jobs

Internet Marketing Book Review

Site Map

Blog Entries
Using Logistic Regression in Excel To Predict If a Prospect Will Buy
Taguchi Testing - What Is It and Is It a Good For Landing Page Optimization?
How To Use the Chi-Square Independence Test in Excel to Figure Out What Makes Your Customers Buy More
How To Solve ALL Hypothesis Tests in Only 4 Steps
Statistical Mistakes You Don't Want To Make
A Quick Normality Test Easily Done in Excel
The 7 Most Common Correctable Causes of Sample Data Appearing Non-Normal
How To Use the t Test in Excel To Find Out If Your New Marketing Is Working
Using the Excel t Test To Find Out What Your Best Sales Days Are
Nonparametric Tests - Completed Examples in Excel
Nonparametric Tests - How To Do the 4 Most Important in Excel
Nonparametric Tests - When the Marketer Should Use Them
Using All 3 Types of ANOVA in Excel to Improve Your PPC Marketing
Comparing Doing ANOVA in Excel with Doing It By Hand
How To Use the Chi-Square Variance Test to Find Out If Your Customers Are Becoming More or Less Focused In Their Spending
Using ANOVA in Excel to Increase Click-Through Rate
The Two Crucial Steps to Excel Regression That Most People Skip
How To Quickly Read the Output of Excel Regression
Work-Arounds for Excel 2003 and Excel 2007s Biggest Statistical Omissions
How To Build a Better Split-Tester in Excel Than the Google Website Optimizer
How To Use Dummy Variable Regression in Excel to Perform Conjoint Analysis
The Chi-Square Goodness-of-Fit Test - Excel's Easiest Normality Test
The Mann-Whitney U Test Done in Excel
The Kruskal-Wallis Test Done in Excel
The Spearman Correlation Coefficient Test Done in Excel
The Sign Test (Nonparametric) Done in Excel
The Wilcoxon Rank Sum Test Done in Excel
The Wilcoxon Signed-Rank Test for Small Samples Done in Excel
The Wilcoxon Signed-Rank Test for Large Samples Done in Excel
Excel's Most Basic Forecasting Tool - The Simple Moving Average
The Weighted Moving Average - A Basic and Accurate Excel Forecasting Tool
Excel Forecasting Tool #3 - Exponential Smoothing
Normal Distribution's 4 Most Important Excel Functions
Using the Normal Distribution To Find Your Sales Limits
Using the Hypothesis Test in Excel To Find Out If Your Advertising Worked
Using the Hypothesis Test in Excel To Find Out If Your Delivery Time Worsened
Using the Hypothesis Test in Excel To Test Your Headlines
Creating a Confidence Interval in Excel To find Your Customer Preferences
Creating a Confidence Interval in Excel To Find Your Real Daily Sales
Using the Normal Distribution To Find Your Range of Daily Sales
SPC Control Charts in Excel - Done Properly
Excel Model Building - Experts vs. Non-experts
Using the Excel Solver To Optimize Your Internet Marketing Budget
Is SPC Limited By The Central Limit Theorem?
Top 10 SEO Excel Functions - You'll Like These!
How To Use the Excel Solver to Find Your Sales Curve
How To Use If-Then-Else In Excel To Remove Matches From 2 Lists
How To Use VLOOKUP In Excel To Find Matches From 2 Lists In 2 Steps
Pivot Tables - How To Set Up Pivot Tables Correctly Every Time
Pivot Tables - One Easy Step That Will Double the Effectiveness of All of Your Pivot Tables!
VLOOKUP - Just Like Looking Up a Number In The Telephone Book
VLOOKUP - Looking Up a Quantity Discount in a Distant Excel Spreadsheet With VLOOKUP




Statistical Training Videos
How To Use Logistic Regression in Excel To Predict If Your Prospect Will Buy
How To Use the Chi-Square Independence Test in Excel to Find Out What Makes Your Customer Make Bigger Orders
How to Graph the Normal Distribution's Probability Density Function in Excel
How To Use Dummy Variable Regression in Excel to Perform Conjoint Analysis
How To Use All 3 Built-In Types of ANOVA in Excel to Improve PPC Marketing
How To Use the Chi-Square Variance Test in Excel to Find Out If Your Customers Are More Focused In Their Spending
How To Create a Histogram and Pareto Chart in Excel
How To Improve a Twitter Follower Acquisition Program with a Histogram in Excel
How To Use SocialOomph - The Most Versatile and Popular Tweet Automation Tool
How To Graph the Normal Distribution's Cumulative Distribution Function in Excel
How to Build a Better Split-Tester in Excel Than the Google Website Optimizer
How To Do the 4 Steps to Regression in Excel - Including the 2 Crucial Steps That Are Almost Always Ignored
Excel Regression Output - How to Quickly Read and Understand It
How To Use ANOVA in Excel to Increase Click-Through Rate in a Pay-Per-Click Campaign
How To Do ANOVA in Excel and also by Hand - Single-Factor ANOVA
Work-Arounds for Excel 2003 and Excel 2007's Biggest Statistical Omissions
How To Graph the Students t Distributions' Probability Density Function in Excel
How To Graph the Chi-Square Distribution's Probability Density Function in Excel
How To Graph the Weibull Distribution's PDF and CDF - in Excel
How To Add Followers To Your Twitter Account Faster Than You Thought Possible
How To Solve Problems with the Weibull Distribution in Excel
How To Solve Problems with the Gamma Distribution in Excel