# Scatter Plots, Correlation, and Regression

This section covers:

Usually around the time that you are beginning “Algebra II” you’ll have another lesson on a little more advanced Statistics than you had earlier (in the Introduction to Statistics and Probability section).  Let’s talk about Scatter Plots, Correlation, and Regression, including how to use the Graphing Calculator.  These are a fun topics, especially for those who love using the calculator!

# Scatter Plots

Again, sometimes in life, we have sets of data and we want to interpret them.  We may want to see if there is some sort of connection between two sets of data, such as the number of hours your friends study per week versus what their grade point average is. It seems like the two variables would be related, but suppose you survey some of your friends to see what a graph would look like:

To make more sense of the data, let’s first order it by the number of hours of studying:

Here’s what the scatter plot looks like.  A scatter plot is just a graph of the x points (number of hours studying each week) and the y points (grade point average):

# Correlation

Notice from the scatter plot above, generally speaking, the friends who study more per week have higher GPAs, and thus, if we were to try to fit a line through the points (a statistical calculation that finds the “closest” line to the points), it would have a positive slope.  Since the trend is that when the x values go up, the y values also go up, we call this a positive correlation, and the correlation coefficient is positive.

Note that a positive correlation doesn’t necessarily mean that the effect of one variable causes the effect on the other variable; there may be a third effect that causes both of the variables to make the same type of changes.  For example, there seems to be a strong correlation between shark attacks and ice cream sales; of course shark attacks do not cause people to buy ice cream, but in hot weather, both shark attacks and people buying ice cream are more likely to occur.

Again, correlation can be thought of as the degree in which two things relate to each other, and the correlation coefficients are anywhere from –1 (strong negative correlation) to 1 (strong positive correlation).  A correlation coefficient of or near 0 means there’s really no connection at all between the two variables.

(We’ll show how to get the correlation coefficient given a set of points later using the graphing calculator.  It is another somewhat complicated statistical calculation that is addressed in more advanced Statistics classes).

Here are some examples:

(“≈” symbol means approximately equal to.)

# Regression

So going back to our original data, we can try to fit a line through the points that we have; this is called a “trend line”,  “linear regression” or “line of best fit” (as we said earlier, the line that’s the “closest fit” to the points – the best trend line).  The formula for getting this line is a bit complicated (the “least squares method”, if you’ve heard of it) and is learned in Statistics, but you may learn how to do with a graphing calculator.

Here’s what the line looks like through our data:

# Using Graphing Calculator to Get Line of Best Fit

You can put the data in the graphing calculator and have the points graphed, and also get the equation for the best fit trend line.  You can then graph this line over the points like we see above.  Pretty awesome!

To do this, you’ll first have to put the data points in “lists” in your calculator:

Now, let’s use the power of the graphing calculator to find the line of best fit for this set of data.  Again, we could do this manually using a complicated formula in Statistics, but the calculator does it so easily!  Basically the math behind finding the best fit is finding a line that has the minimal distances to each of the points.

## Basic Stats on Data from Calculator

Before I show you how to get the line of best fit, let’s get some simple data on the two sets of data – like the mean, median, quartiles, and max (that we got by hand for our Box and Whisker Plot earlier in the Introduction to Statistics and Probability section).  For our set of data, since we have two sets of data in our lists, we can use either 1-Var Stats or 2-Var Stats to get information about just the first set of data we put in L1, or both sets of data that we put in L1 and L2:

Now let’s go back and do the regression of our data (find the line of best fit).

Note that before you do this, you should turn on diagnostics so you can see the correlation coefficient r and also (which is the square of r and can used to compare linear and non-linear regressions to see which fits best), by pushing “2nd 0” (CATALOG) and then move cursor to DiagnosticOn and hit ENTER  ENTER.  You can just keep this on and not worry about it.

Note that we show an exponential regression here in the Exponential Functions section.

Learn these rules, and practice, practice, practice.