Basic statistical terms

Discrete data

This is data which can be counted, e.g. number of legs, number of sunny days in June.

Discrete data is restricted to certain values, often whole numbers.

Discrete data can be ordinal or nominal

Nominal  data:

 (Unordered category data) -
 This is data which can be counted but not ordered.

Examples

Make of Car, Pets owned, flavours of bags of crisps sold.

The data is often displayed by a bar chart, pie chart or pictograph.

This is because a numerical relationship cannot be deduced for the information between the categories.

 (E.g. what is halfway between dogs and cats?)  

Example of a bar chart 

1

Example of a pie chart

 

2

 

Pictograms

Pictograms reprsent data.

 

Example

amy

Amy cat likes eating salmon pouches

 

How many pouches did she eat over a week ?

amy

Amy ate a total of 25 salmon pouches .

 

 

Ordinal data

(Ordered category data)

This is data which has categories which can' be counted and ordered.

Example,

The following bar chart shows the responses to the question "How much do you like dogs?"

The attitudes can be ordered from Love to Hate, but there is still no way to form a numerical relationship for the information between the categories.

3

 

Continuous data

This is data which can be measured,

Examples

Length of hair , amount of rainfall in June.

Continuous data can fall any where within the data range.

 

Line graphs can be used to form a numerical relationship for the information between the categories.

To make a line graph, plot each point with a dot, then join the dots with a straight line.

Always use a ruler !

Example

Data for rain fall for a location measured over one year :-

4

 

wet

 

The graph shows that the location was wetter in the first four months of the year than the rest of the year, with July being the driest month.

 

 

Quantitative data

This is data which can be counted in numbers. It can be either continuous or discrete.

Qualitative data

This is data which describes something categorical, e.g. "He is tall", "She has blue eyes"

Population

A population is the complete set of persons, values or things for which data is being collected. For example, all dogs in the UK would form the target population for a statistical analysis on dog food preferences.

Census

A census collects data from the whole population.

Sample

A sample collects  data from a part of the population.

Average

 

There are three commonly used averages, the mean, median and mode.

Basic statistics

Mean
( also called arithmetical average)

1

Mode


2

 

Median

 

3

 

Range

The range is the difference between the highest and lowest numbers.

4

Stem and leaf diagram

A stem and leaf diagram allows data to be recorded quickly, and for  simple statistics to be found.

A stem and leaf diagram must have

Example

5

Use tens for the stem and units for the leaves.

Put the data in order !

1

From the diagram, it can be seen that the median score is 68% and that the modal group is the seventies percentage range, since five people got a score between 70 and 78%.

If the pass mark was 50%, it can be seen that 10/15 or 2/3 of the pupils passed the test.

Back to back stem and leaf diagrams

Sometimes, two sets of data must be recorded and compared. A back to back stem and leaf diagram helps quick comparison.

This time, the stem is in the center, with the leaves as data to either side.

The first set of data is read as normal, from center to right.

The second set of data is read backwards from center to left.

Example

6

Put the data in order !

 

2

 

From the diagram, it can be seen that the median score for class 2 is 45% and that the modal group is the forties percentage range, since seven people got a score between 43 and 48%. The lowest score for class2 is 9%, the highest is 60%.

If the pass mark was 50%, it can be seen that class1 did far better than class2.

Dot Plots

A dot plot lets you see how the data is spread. A dot is placed for each piece of data. The Mode can be seen quickly.

A dot plot must have

 

Example

Pulse rate of patients attending clinic

7

4

The mode is 68.
Most of the data lies between 65 and 72 beats per minute.

 

Box Plots

A box plot  also lets you see how the data is spread.
 It is formed from a 5-figure summary.
A box is drawn around Q1, Q2 and Q3, with tails going out to L and H.

A box plot must have

Example 

Pulse rate of patients attending clinic

7

 

5

 

  

Excel example
An interactive sheet to calculate standard deviation and draw box plots.

The five-figure summary

 

H = 10
L = 2
Q1 = 5
Q2 = 7
Q3 = 8

Example
2   4   5   5   6   7   7   8   8  9    10 

How many  items of data are there ? 
What is the highest number ?  
What is the lowest number ?   

What is the median ?  ( 6th position,  so Q2 = 7)

There will be      (n-1) /2  numbers in each half.
So there will be 5 numbers in each half.
 The  lower quartile is the median of the lower half, so Q1 = 5
The upper quartile is the median of the upper half, so Q3 = 8

6

 

This information can be represented as a box plot :-

6 

When comparing distributions, it is useful to know

Each quartile represents 25%, so 50% of the data is represented betweens Q1 and Q3.

Interquartile range = Q3 - Q1

Semi - interquartile range =1/2 (Q3 - Q1)

Example

Compare the two sets of maths results.

7

 

 

8

Paper2 has a better score overall, since more than three quarters of the candidates got a score of 50 or more, whereas only 50% of those sitting paper 1 got between 50 and 80%.

Paper 2 has a larger spread of marks and a median of 70% , but both papers have the same inter quartile range.

           

 Standard Deviation

This is a measure of how much the data varies from the mean.

A standard deviation of zero indicates that the data and the mean are effectively the same.

Two formulae are given by the SQA to calculate the standard deviation:

 

16

 

With the first equation:-

 

With the second equation:-

Example

Calculate the standard deviation of the numbers
105  133  142  185  186

First Equation

17

18

Using equation 2

 

19

20

Excel example
An interactive sheet to calculate standard deviation and draw box plots.

Changing all of the numbers by the same amount does not affect the standard deviation.

Example

The standard deviation of

101  105  133  142  185  186

 Is the same as the standard deviation of

1   5   33  42  85  86

And

4   8   36  45  88  89

 

Frequency Tables

These are a useful way of collating raw data, to quickly see the mode, find the median  and calculate the mean.

Example

A manufacturer claims that each packet of shazbo contains 20 sweets on average.

 

When 30 packets of Shazbo are examined,the results are as follows :-

No. of sweets per packet

18  17  22  19   20   20   21  19  18   20

21  19  21  19   20   20   20  17  19   21

22  18  17  16   20   20   20  21  21  20

Is the manufacturer correct ?

 

Construct a frequency table of the data.

 

1

The table shows that the mode of the sample is 20 sweets, which has a frequency of 10.

The median value lies half way between the

15th and 16th values.

The frequency column shows that the first 12 values have between 16 and 19 sweets.The 15th and 16th values have 20 sweets.

The median is therefore 20 sweets.

 

To calculate the mean , we need to add another column  and multiply the frequency by the number of sweets.

 

23

24

The mean is 19½ sweets,

 which could be rounded to 20 sweets.

The manufacturer is correct !!

Cumulative Frequency

Cumulative frequency is used to show the running total.

Example

25

A cumulative frequency diagram, or ogive, is an s shaped  plot.

26

 

It is useful for finding the quartiles.

Use the y axis scale to find where the quartiles should be, then read across to where the line touches the curve. Read off the corresponding x value.

 

27

 

From the diagram Q2 = 19.2 (approx), Q1 = 18.1 (approx) and Q3 = 20.1 (approx)

This gives an SIQR of  1 (approx) , which shows that the data does not vary hugely.

Excel example
An interactive sheet to draw cumulative frequency diagrams.

Grouped Frequency tables and mid point class intervals

These are used when the data is sorted into intervals. 

Example

The scores for an S4 homework are shown below as percentages.

 

8

Firstly, the data is sorted into equal intervals

 

12

The Modal interval is 30 to 39

The mean is 1630 / 30 = 54.3333

The median is halfway between the 15th and 16th items.

This is found by adding the frequencies and  occurs in the interval 51 to 60.

 

Relative frequency

The relative frequency is a measure of the fraction of the data. It can be used to predict amounts.

To find the relative frequency, divide the frequency for the particular item by the total frequency of the data

The total relative frequency must always be 1..

 

Example

The following vehicles were sold in Dogland.

32

How many Woofers are expected to be sold per 1000 vehicles ?

33

Woofers account for 12.5% of the vehicle sales, so for every 1,000 vehicles sold by the dealer in Dogland, 125 of them would be expected to be a Woofer.

To display the data as a pie chart, calculate the fraction of 360°

34

 

35

 

relative frequency
An interactive sheet to draw relative frequency diagrams.

 

Scatter Diagrams

Scatter graph : Positive correlation

3

This is positive, since the data rises from left to right.

Scatter graph :Negative correlation

4

This is negative, since the data drops from left to right.

The more maths missed - the lower your score !

Scatter graph :Negative correlation

5

There is no correlation, since the data is spread out in the middle.

Your maths score does not depend on your shoe size!

Line of  Best  Fit

This allows empirical data to be plotted. A straight line is then drawn , which tries to go through as many of the data points as possible - but has an equal number of points above and below the line.

In science classes, the mean of the data is often plotted and used as a point on the line.

Once the line has been drawn, and extended back to the y axis, the gradient can then be calculated.

The equation of the line can then be calculated, using y=mx +c  .

This equation can then be used to make predictions.

Example

Test  scores for an S4 maths and physics test are shown below:-

A) Is there a correlation between scoring well in maths and physics ?

Explain your answer.

B) Draw a line of best fit on the scatter graph and use it to find an equation  linking the physics and  maths test results for this data set.

C) Use your equation to predict the physics test score for a pupil who scored 55 in the maths test.

36

 

Data is plotted on a scatter graph.

37

A) From the graph, a positive correlation exists since the data slopes upwards in a line  from left to right.Scoring well in Physics suggests scoring well in Maths.

 

38

b) The equation of the line is y = 0.8x +13

C)

9

A pupil with a maths score of 55 has a predicted physics score of 57

line of best fit
An interactive sheet to draw the line of best fit.

 

 

Probability

   Probability is  the chance of something happening.

  All probabilities lie between  0 and 1

39

Probability can be written as a fraction, decimal or percentage.

Example

An Even chance can be written as 0.5, 50% or ½

40

The probability of an event happening is equal to the number of ways an event can happen divided by the total number of possible outcomes.

Example

A bag contains 5 blue, 6 green and 4 red counters.

What is the probability that a counter picked at random will be green ?

41

Example

What is the probability that a card picked at random from a standard pack of 52 playing cards will be :-

A red card ?

An Ace ?

The King of Spades ?

42

Mutually exclusive events cannot happen at the same time.

43

Example

What is the probability that a card picked at random from a standard pack of 52 playing cards will not be an Ace ?

44

The addition law

This is used to find the probability of two mutually exclusive events to happen.

45

Example

A bag contains 5 blue, 6 green and 4 red counters.

What is the probability that a counter picked at random will be green or red ?

46

The multiplication law

This is used to find the probability of two totally independent events to happen.

47

Example

A man is playing a game which involves spinning  a wheel.

The wheel has 36 slots, numbered 1 to 36, evenly coloured red or black.

What is the probability that the ball will land on a black number which is between 10 and 20, exclusive ?

48

Tree diagrams

Probabilities are written on the branches of the diagram and multiplied to give the probability of two events happening.

When there is more than one way of obtaining the desired outcome, the probabilities for each way are added together to find the total probability.

Example

A coin is tossed 3 times.

What is the probability that the coin will land:

heads up all three times ?

tails up twice only ?

49

50

 

Total Possible outcomes

HHH   THH

HHT    THT

HTH    TTH

HTT    TTT

 

Two way Tables

 

Example

The following table shows the crisps preferences by brand and flavour of 60 people.

Are more of the people likely to choose Cheese and Onion as a flavour than BigDog Crisps as a brand ? Give a reason for your answer.

 

w1

 

First, add total columns.

w2

Now work out the probability for each part of the question.

There are a total of 23 Bigdog Crisps chosen out of the 60 bags of crisps.

24

There are a total of 24 Cheese & Onion crisps chosen out of the 60 bags of crisps.

25

There is a 1.7% greater chance that Cheese & Onion would be picked as a flavour than Bigdog crisps would be picked as a brand.

Example

What is the probability of a person choosing a packet of Cheese & Onion flavoured BigDog crisps ?

26

 

© Alexander Forrest