Continuous Probability Distributions, Normal Distribution, Central Limit Theorem, T-Distribution
This guide is focused on foundational concepts needed for Applied Statistics as well as Intro to Statistics and assumes a general knowledge of how to navigate and use equations in excel. See the Microsoft office tech tutorial links for excel tutorials.
Probability distributions allow us to find the probability of being a certain number of standard deviations above or below the mean. When distributions are approximately normal, we can use the empirical rule to find probabilities for different values of (x).
The central limit theorem allows us to make comparisons using the means of different random samples. With this theorem, we can find z-scores or t-statistics that tell us how far away a sample mean is from the true population mean, and how likely it is for that mean to occur. This will be useful when we do statistical tests like hypothesis testing.
Continuous Probability Distributions
Probability Distribution Function (PDF)
A PDF shows the relative likelihood of all values for a continuous random variable.
The area under the curve of a PDF must equal 1, or 100%
Probability cannot be negative, so a PDF cannot have negative area.
Examples:
A PDF in which there is equal likelihood for all values of x (Uniform Distribution). A and B are equally likely.
A PDF in which extreme values (lowest and highest) are the least likely and the middle value is the most likely (bell curve)
A PDF in which the likelihood increased as x increases (higher x is more likely than lower x)
Cumulative Distribution Function (CDF)
The Cumulative Probability: the likelihood that a value is at most x
Starts at 0 and ends at 1 (100%)
Cannot decrease (that would imply a negative probability).
Example:
The probability that x is 175 or less is approximately .4, or 40%.
Normal Distribution
The Normal Distribution occurs naturally in nature. It’s used in science, psychology, business and more.
In a normal distribution, the center of the distribution is the mean (μ).
A normal distribution is shaped like a bell curve.
The spread of a normal distribution is the standard deviation (σ).
The Empirical Rule
The empirical rule states that approximately 68.3% of data in a normal distribution lies between 1 and -1 standard deviations from the mean, approximately 95.4% lies between 2 and -2 standard deviations, and approximately 99.7% lies between 3 and -3 standard deviations.
This image shows what percentage of the data is under each section of the curve based on the empirical rule.
Outliers
An outlier is defined as a value beyond 3 or -3 standard deviations from the mean.
A value beyond 2 or -2 standard deviations from the mean is considered unusual.
Z-Score
A z-score standardizes how far away a datum is from the mean. Z-scores are measured in standard deviations. Z-tables can be used to find the probability that a datum is a specific number of standard deviations from the mean. You can also use excel instead of a z-table.
Symbols:
z = z-score
x = the value of your datum
μ = the mean
σ = standard deviation
Equation:
z = (x-μ) / σ
Standard Normal Excel Equations
In a standard normal distribution, the mean is always 0 and the standard deviation is always 1.
To find probability when z-score is given: =NORM.S.DIST(z, true)
Gives the probability that a z-score is at or below a given value
To find the probability that a z-score is at or above a given value: =1—NORM.S.DIST(z, true)
To find the z-score when probability is given: =NORM.S.INV(probability)
Give the probability in decimal form (example: 45% = 0.45)
General Normal Excel Equations
In general normal distribution, the mean and standard deviation can be anything.
To find probability of datum value x: =NORM.DIST(x, mean, standard deviation, true)
Gives the probability that x is at or below a given datum value
To find probability that x is at or above a given datum value: =1—NORM.DIST(x, mean, standard deviation, true)
To find x when a probability is given: =NORM.INV(probability, mean, standard deviation)
To find the probability between two values:
P(x< Z < b) = P(Z< b) – P(Z< x)
=NORM.S.DIST(b, true) – NORM.S.DIST(x, true)
The Central Limit Theorem
If the sample size of a sampling distribution is large enough, the distribution will be approximately normal.
Sampling Distribution: The distribution of sample means for multiple random samples of the same size (n).
Conditions:
- The sample must be random
- The samples must be independent
- The sample size must be equal to or greater than 30
- The sample size must be at most 10% of the population
Example:
We took a random sample of 50 grocery stores and found the mean price of a loaf of bread is $2.50. We then took 37 more random samples, all of 50 grocery stores, and find their means. We now have a sampling distribution of 38 means. This distribution will be approximately normal with a bell curve shape.
Z Distribution
A Z distribution can be used when the population standard deviation is known and the sample size is equal to or greater than 30.
Symbols:
z = z-score
x̅= Sample mean
μ = Population mean
σ = Population standard deviation
n = Sample size
Equation:
T Distribution
The t-distribution is used when the sample size is less than 30 or the population standard deviation is unknown
Symbols:
t = t statistic
x̅= Sample mean
μ = Population mean
s = Sample standard deviation
n = Sample size
Equation:
Excel equations:
Use degrees of freedom. df = n-1
Gives probability when t stat is given =T.DIST(x, df, true)
x = t statistic
Gives t-stat when probability is given =T.INV(probability, df)
Give probability as a decimal (example: 65% = 0.65)
Next Steps:
Now that you know how to use probability distributions to compare values with the mean, you’re ready to learn how to perform and make conclusions with hypothesis testing and confidence intervals.
Need More Help?
Click here to schedule a 1:1 with a tutor, coach, and or sign up for a workshop. *If this link does not bring you directly to our platform, please use our direct link to "Academic Support" from any Brightspace course at the top of the navigation bar.