# Introduction to Statistics and Probability

This section covers:

Note that we’ll cover Scatter Plots, Correlation, and Regression here.

Somewhere in the pre-algebra stages, you’ll get a short, but fun introduction to Statistics, where you’ll cover such topics as finding the average and median of numbers, the median of numbers, organizing data (numbers) with different types of graphs, and performing analyses on data. Typically, this type of mathematics falls under the realm of Statistics, or the study of the collection, organization, and presentation of data.

You may also be introduced to Probability, which is the study of how likely events are to occur, or happen.

Let’s first talk about some types of measurement you’ll see in Statistics.

# Average, Mean, Median, Mode, and Range

Let’s say you go around and ask your friends to keep track of how much time your friends spend each week doing their homework.  Fortunately, you have a lot of friends online that you can ask quite easily.  So 20 of your friends reply with the following numbers:

10, 2, 15, 28, 1, 32, 12, 14, 8, 17, 22, 6, 42, 3, 14, 7, 12, 23, 20, 8  hours of studying each week

(I see that you have fairly studious friends!)

Just by glancing at the numbers, can you guess the average?  Probably not.  But it’s easy to get the average: you just add all the numbers up and divide by the total number of responses:

Average hours per week of your 20 friends =

Another word for the average is the mean.

Now let’s do some other fun stuff with this data.  What if someone asked you for a number that is exactly in the middle of the data; in other words, the number that has just as many answers above it as below it.  Let’s sort the data (put it in order) to get this number:

1   2   3   6   7   8    8   10   12   12   14   14   15   17   20   22   23   28   32   42

To get the middle number or median, cross out numbers from both ends until you arrive at the middle number or numbers.  Since we have an even number, we’ll get two “middles”, so to get the median, we’ll have to take the mean or average of those numbers:

So we end up with two numbers in the middle: 12 and 14.  If we just had one number, that number would be the median, but since we have two numbers, we take the average of 12 and 14, which is .  So the median is 13. Remember that the word “median” sounds like the word “middle”.

The mode is the number or numbers that occur most often; you can have more than one mode.  In the case of our data, the modes are 8, 12, and 14, all of which occur twice in the data.  Remember that the word “mode” sounds like the word “most” (often).

The range, which is the difference between the largest number and smallest number, is 41 (42 – 1).

Later, we’ll learn how to do these cool things in our graphing calculator!

There are a couple different ways of viewing data graphically that you’ll learn in your math classes:

# Box and Whisker Plot

A box and whisker plot is a visual picture of the data that shows where the middle of the data is (the median), and how far away from the middle the other points lie.

To draw what we call a box and whisker plot, or box plot, we also need to get the lower and upper quartiles,  which are like the medians of the numbers to the left and right of the median, including the median. (In our case, we include the 12 since we didn’t have a true middle number, or median).  To get the lower quartile, let’s start crossing out again:

We are left with 7 and 8 in the middle, and the mean is 7.5.  So the lower quartile is 7.5.

We do the same to get the upper quartile, but use the numbers to the right of the median.

We are left with 20 and 22 in the middle, and the mean is 21.  So the upper quartile is 21.

Note that the word “quartile” is related to the word “quarter”; we have divided up the data into four quarters.

Then we plot the lowest number, lower quartile, median, upper quartile, and highest number like this; see how it looks like a box with whiskers on each side?

Can you see that of your 20 friends (5 friends) study less than 7.5 hours per week, (10 friends) less than 13 hours, and (15 friends) less than 21 hours a week?  You can also see that  of your friends (10 friends) study between 7.5 hours (the lower quartile) and 21 hours (the upper quartile) per week.   You can also see that the highest point (42 hours) is somewhat of an outlier; this means that this point may not fit in with the rest of the data.

(Note that we could have used our graphing calculator to derive some of these values, such as the median and quartiles.  See Basic Stats on Data from Calculator in the Scatter Plots, Correlation, and Regression section to see how to do this).

# Stem and Leaf Plot

We could also draw a stem and leaf graph with the data, which resembles plant stems and leaves:

For this plot, you put the first digit (the tens) of all the numbers on the left hand side, and then put the ones on the right hand side, in order.  Here you can see the same thing – since we know (from earlier) that the median is 7.5, we can see that there is a larger difference between the median and the largest number (42) than the median and the smallest number (1).

We can also see how the smaller data is more clumped together; we’ll see this next in Frequency Tables and Graphs.

# Frequency Tables and Graphs

We could also draw what we call a frequency table that shows us how many of your friends studied less than or equal to 10 hours per week, 11 to 20 hours per week, 21 to 30 hours per week, 31 to 40 hours per week and 41 to 50 hours per week.  These are called “buckets” or “classes” and each class has 10 hours in it (0 to 10 hours, and so on).  Notice that our buckets of data are a little different than the stem and leaf table above:

Then we could draw a histogram from this data, as shown below.  A histogram, or a frequency graph is a graph showing the distribution of the data – where it lies.

Or even better, we can draw a relative frequency histogram, where we divide the number of friends in each “bucket” or “class” by the total number of friends, as shown below.  The reason this is a better graph is that if you add up all the amounts for each bucket, we’ll get a grand total of 1 (all the decimals on the left add up to 1), so we can compare different sets of data together.

This data is skewed to the right a bit (the right hand side has a longer “tail”).  When data is skewed to the left, the left side has a longer “tail”.  When data isn’t skewed left or right, we call the data “symmetric”.

When the data is skewed right or left, the median is a better measure of the central tendency (average) for the data, since the mean could be misleading.   The mean tends to go out in the “tail”.   Here are some examples of other data that shows this (means and medians may not be accurate – just giving an idea):

# Pie Chart

One more type of graph that’s fun to draw is a circle graph, or pie chart.  We could divide up the “buckets” from our data above (numbers of hours our friends are studying per week), and compute how big to make the pieces of the pie with the use of a little bit of Geometry.

Since we know the percentages of friends who fall into each category from the relative frequency chart above, we can get the angle measurements of each piece of the pie (from the center of the pie) by using proportions and the fact that there are 360 degrees in a circle.   By using proportions (for example, ), we find that we can just multiply the relative frequency by 360 degrees to get each angle measurement.  Here’s the table again, with the degrees for each bucket:

So (using our handy-dandy protractor that we learned how to use in Elementary School Geometry, and will see later in our Geometry section) – here is our pie chart:

Don’t worry if you don’t totally get all this this now!  Later on in more advanced Algebra we’ll learn even more ways to display and interpret data (like when we compare two sets of data), and also how to make a lot of these displays on your graphing calculator.

# Probability

Before Algebra, you may also have studied a topic called Probability, which is related to Statistics.

Basically, a probability is a number between 0 and 1 that tells us how likely something is about to occur.  Have you ever heard the expression “The probability of my passing this course is about 0%”?  That’s not a good sign for passing that course.

Probability can get complicated in advanced courses (we’ll visit some of this later), but we’ll just talk about a few “counting” techniques and how to compute some basic probabilities.

A probability is defined as a fraction with the number of times something occurs over the number of possible ways something can occur.  For example, the probability of getting a head if you flip a coin is , since only one thing happens (either a head or a tail), but 2 things could have happened (the head or the tail).  It’s a little confusing, but you’ll get it after a while.  This is an experiment, since it involves chance.

Experimental probabilities are those you get by actually doing an experiment (like flipping the coin above).  You’d  have to this for many, many times (try it!) to get close to the theoretical probability, which is the probability we get through mathematics.

An example of trying an experimental probability is to flip a coin 40 times and record whether you get a head or a tail.  At each coin toss, add up the number of heads so far, and divide by number of flips so far, to get the experimental probability each time.  Notice how it gets closer to the theoretical probability .5 (more reliably – less variability) as you get closer to 40 coin tosses.

I just did this experiment with flipping a penny and checking the experimental probability that I get heads at each coin toss (total number of heads so far, divided by total number of flips so far).  Notice how, even though the experimental probability doesn’t end up at exactly .5, the trend is that it gets closer to .5 (with less variance or deviation) the more times I flip the coin:

If we did this experiment say for 2000 times, our experimental probability each time would be reliably very, very close to the theoretical probability of .5.   (You might try this for a science experiment!)

Something that has no chance of happening has a probability of 0 (like the probability of getting a 7 when you roll a die), and something that will always occur has a probability of 1 (like the probability of getting a number in between and including 1 and 6 when you roll a die).

Probabilities usually involve some sort of counting to put on the top or bottom of the fraction.  For example, let’s say we have 3 shirts, 2 skirts, and 2 pairs of shoes that we’ve taken on a vacation.  We want to know the probability of picking our sleeveless blue shirt, with our purple skirt, with our platform sandal shoes for that day.

So do you see that the total number of things that you can get, or outcomes, is 3 times 2 times 2, which would be 12?  Think about it – for the first shirt, you could wear one of two skirts, and one of two pairs of shoes, for the second shirt, the same thing, and so on.  You could draw a “tree” diagram like this:

So to get that combination (sleeveless blue shirt, purple skirt, and sandals), there would be 1 way out of 12 possible ways, so the probability would be !

Here are some more examples of probability:

Again, don’t worry if you don’t get all this now; it’s just important to get the main concepts.

Learn these rules, and practice, practice, practice!

Click on Submit (the arrow to the right of the problem) to solve this problem. You can also type in more problems, or click on the 3 dots in the upper right hand corner to drill down for example problems.

If you click on “Tap to view steps”, you will go to the Mathway site, where you can register for the full version (steps included) of the software.  You can even get math worksheets.

You can also go to the Mathway site here, where you can register, or just use the software for free without the detailed solutions.  There is even a Mathway App for your mobile device.  Enjoy!

On to Introduction to Algebra – you’re ready!

## 6 thoughts on “Introduction to Statistics and Probability”

• Yes! Thanks SO MUCH for writing; I’ve fixed this. Please let me know if you see anything else 😉 Lisa

1. Instead of presenting the numbers as follows:
“10, 2, 15, 28, 1, 32, 12…hours of studying each week”,
I strongly suggest presenting the numbers already placed in increasing order: “1, 2, 10, 12, 15, 28, 32… hours of studying each week”.

At least for someone like me, it’s much easier to focus on the issue at hand, considering averages and the like, than having to face a group of numbers that I have to place in order before addressing the issue at hand. I find doing so an uncomfortable distraction. This goes for many other examples in teaching statistics.