What are Descriptive Statistics?
This guide is focused on foundational concepts needed for Applied Statistics as well as Intro to Statistics and assumes a general knowledge of how to navigate and use equations in excel. See the Microsoft office tech tutorial links for excel tutorials.
Descriptive statistics, including central tendency (center), skewness (shape), and variability (spread), are foundational concepts. These descriptors are used frequently when first approaching a data set and will come up in assignments for Applied Statistics.
Measures of Center covers the mean, median, and mode and includes the necessary equations for finding each, defined symbols, steps for solving, an example, and the corresponding excel equation. For an introduction to central tendency, see this guide.
The Shape section defines symmetric and skewed graphs with a visual representation of each.
The Spread, or variability, section defines variance, standard deviation, z-scores, range, quartiles, inter-quartile range (IQR), and box-plots. This section also includes any necessary equations. For more information about box-plots and the five number summary, watch this video.
Measures of Center
Mean
The average
Symbols:
μ = Population Mean
x̄ = Sample Mean
xi = A data point
∑ = Sum of all
n =Sample Size (Number of data points)
Equation:
- Add all data points together
- Divide by the number of data points
Example:
Find the average test score of this class of 10 students:
90 | 82 | 98 | 76 | 80 | 81 | 78 | 90 | 93 | 100 |
---|
- 90 + 82 + 98 + 76 + 80 + 81 + 78 + 90 + 93 + 100 = 868
- 868/10 = 86.8
Excel Equation: =average(array)
“Array” means highlight your data. You can highlight data in excel by selecting the first cell in your data list and holding and dragging your mouse down to the end of the data list
Median
The middle value in a sorted set (including repeat numbers)
For an odd data set: Use the middle value
For an even data set: Take the average of the two middle values
How to find the median:
- Sort the data
- The middle value = (n+1)/2
Example:
Find the middle value for a data set of 25 values
(25+1)/2 = 26/2 = 13
The median is the 13th data value
Find the middle value for a data set of 26 values
(26+1)/2 = 27/2 = 13.5
The median is the average of the 13th data value and the 14th data value
Excel Equation: =median(array)
“Array” means highlight your data. You can highlight data in excel by selecting the first cell in your data list and holding and dragging your mouse down to the end of the data list.
Mode
The most frequent occurring value(s)
There can be more than one mode, or no mode
Typically used for categorical data
How to find the mode:
- Sort the data
- Make a frequency table
Example:
Find the mode for the following data set
1—brown eyes
2—blue eyes
3—other
2 | 1 | 1 | 3 | 2 | 1 | 3 | 1 | 1 | 2 |
---|
1. Sort the data
1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 |
---|
2. Frequency table
Eye color | Frequency |
---|---|
1-brown | 5 |
2-blue | 3 |
3-other | 2 |
The mode is 1—brown eyes.
Excel Equation: =mode.mult(array)
“Array” means highlight your data. You can highlight data in excel by selecting the first cell in your data list and holding and dragging your mouse down to the end of the data list.
Shape
Symmetric
Normal distribution
“Bell curve”
Use the mean (average)
Skewed
Outliers pull the tail out
Use the median
Left skew:
Long left tail
x̄ < median (the mean is less than the median)
Right skew:
Long right tail
x̄ > median (the mean is greater than the median)
Spread
Variance
How far on average is each data point from the mean—variability
σ2 = Population Variance
s2 = Sample Variance
Excel Equation for Population: =VAR.P(array)
“Array” means highlight your data
Excel Equation for Sample: =VAR.S(array)
“Array” means highlight your data
Standard Deviation
A measure of the dispersion of a data set
σ = Population SD
s = Sample SD
Excel Equation for Population: =STDEV.P(array)
“Array” means highlight your data
Excel Equation for Sample: =STDEV.S(array)
“Array” means highlight your data
Z-Score
How many standard deviations a value (x) is from the mean
x = Data value
x̄ = Mean
s = Standard Deviation
Excel Equation: =STANDARDIZE(data value, mean, standard deviation)
Range
How far the lowest data value is from the highest data value
Subtract the minimum value from the maximum value
max — min
Quartiles
25% of the data
Excel equation for Quartiles: =quartile(array, quartile)
Quartile values for excel:
0—gives minimum value
1—gives quartile 1
2—gives quartile 2 or median
3—gives quartile 3
4—gives maximum value
Interquartile Range (IQR)
How far quartile 1 is from quartile 3
Used in skewed distributions
Q3 - Q1
Next Steps
Now that you know all about descriptive statistics, you’re ready to start analyzing data sets! These concepts will be used all throughout your statistics course to describe data and make inferences.
Next, it’s time to learn about inferential statistics, starting with linear regression.
Need More Help?
Click here to schedule a 1:1 with a tutor, coach, and or sign up for a workshop. *If this link does not bring you directly to our platform, please use our direct link to "Academic Support" from any Brightspace course at the top of the navigation bar.