Academic Resource Center

Scatterplots and Linear Regression

Updated on

How to create a scatterplot, find a regression equation, and calculate residuals and sum of square errors.

This guide is focused on foundational concepts needed for Applied Statistics as well as Intro to Statistics. It assumes a general knowledge of beginning algebra and how to navigate and use equations and charts in excel. See the Microsoft office tech tutorial links for excel tutorials.

Scatterplots help us visualize data and make inferences about the relationship between two variables. Scatterplots can show us if a relationship is linear or non-linear, positive or negative, and how strong the relationship is.

Regression Equations allow us to make predictions about one variable in response to another variable. The regression equation consists of a y-intercept and a slope. We can interpret the slope and y-intercept to make conclusions about the relationship between our variables.

Residuals show us how different each point of our observed data is from the linear model we’ve created (the regression equation) and the predicted y’s we can calculate from that model. The sum of squared errors is a sum of those residuals that shows us how well our model represents the original data.

Scatterplots

Variables

X Variable

Independent Variable

Predictor Variable

Y Variable

Dependent Variable

Response Variable

Example:

A group of scientists want to see if varying amounts of sunlight help a plant grow faster. What is the predictor variable and what is the response variable?

Amount of sunlight is the variable that’s useful for making predictions about plant growth. So the predictor (X) variable is the amount of sunlight.

The growth rate of the plant responds to the amount of sunlight it receives. So the response variable (Y) is the plant’s growth.

We can’t use the growth of a plant to predict how much sunlight it has. We can change the amount of sunlight a plant gets and measure the impact on growth.

How to Create a Scatterplot in Excel

  1. Organize your data: Make sure your X variables are in the left column and Y variables are in the right column
  2. Highlight both columns of data (X and Y)
  3. Click “Insert” from the top menu
  4. Under “charts” choose the icon with the dots
  5. Select the first option—the graph with dots and no lines
  6. Use the “+” in the upper right corner of the scatterplot to add a trendline
  7. Click the arrow next to “trendline” and select “more options” to add the regression equation and R2 to the graph

Linear: the data trends in a straight line

Scatterplots and Linear Regression.pdf - Google Drive - Google Chrome

Non-Linear: the data trends in a curved (not straight) line

Scatterplots and Linear Regression.pdf - Google Drive - Google Chrome

Positive: As X increases, Y increases

Negative: As X increases, Y decreases

Scatterplots and Linear Regression.pdf - Google Drive - Google Chrome

Strength: Strong (a), moderate (b), or weak (c). How closely to the data points cluster around the trendline?

Scatterplots and Linear Regression.pdf - Google Drive - Google Chrome

Linear Regression

The Regression Equation

Symbols:

ŷ = y-hat, predicted y, fitted value

b0 = b-sub-zero, y-intercept (what is y when x=0)

b1 = b-sub-one, slope coefficient, the change in y for every 1x (every time x goes up one, what does y do?) Positive slope = positive correlation. Negative slope = negative correlation

Equation:

ŷ = b0 + b1x

Excel equation: =linest(y’s, x’s)

Select y column first, enter a comma, then select x column

Results will appear in two cells

Slope Intercept

Interpreting Slope

As [x variable] increases by one, [y variable] increases/decreases by [slope]

Example: The regression equation for profits depending on how many customers an online

store has is: y = - 500 + 25x

As number of customers increases by one, profits increase by $25.

Interpreting Intercept

When [x variable] is 0, the y variable [means______/has no practical meaning]

Example: The regression equation for profits depending on how many customers an online

store has is: y = - 500 + 25x

When the number of customers is zero, the profits are -500, meaning there’s $500 of

overhead cost before the store makes a profit.

Predicting Y

Our regression equation is y = 24 + 3x. What is the predicted y when x=5?

Plug 5 in for “x” by multiplying by the slope coefficient. y = 24 + (3 x 5) = 24 + 15 = 39

Residuals

Symbols:

y = Observed y value

ŷ = Predicted y value

Equation:

y - ŷ

Observed values minus predicted values

X data Y data (observed)

ŷ

Residuals
Given Given (observed values) Use regression equation to find predicted y y - ŷ

Example:

X Y
1 9
2 4
3 4

Regression equation: y = 10 -2x

X data Y data (observed) ŷ Residuals
1 9 10-2(1) = 8 9-8 = 1
2 4 10-2(2) = 6 2-4 = -2
3 4 10-2(3) = 4 4-4 = 0

Sum of Square Errors

  1. Find residuals (as above)
  2. Square the residuals
  3. Sum up all the squared residuals

Example:

X data Y data (observed) ŷ Residuals Squared Residuals
1 9 10-2(1) = 8 9-8 = 1 12 = 1
2 4 10-2(2) = 6 2-4 = -2 -22 = 4
3 4 10-2(3) = 4 4-4 = 0 02 = 0

1 + 4 + 0 = 5

The sum of square errors is 5.

Next Steps:

Now you know how to read a scatterplot and interpret linear trends. These skills will allow you to make inferences about data that is linearly correlated. Linear analysis will also allow you to make predictions about a population.

But there’s more still to learn about linear regression. Find out how to calculate the slope of a line from coordinates on a graph and the strength of linear relationships in the Fitted Models guide.

Need More Help?

Click here to schedule a 1:1 with a tutor, coach, and or sign up for a workshop. *If this link does not bring you directly to our platform, please use our direct link to "Academic Support" from any Brightspace course at the top of the navigation bar.

Previous Article Normal Distributions and Central Limit Theorem
Next Article Exponent Rules
Have a suggestion or a request? Share it with us!