#2.2.1
Interpret diagrams for single-variable data, including understanding that area in a histogram represents frequency.
Students should be familiar with histograms, frequency polygons, box and whisker plots (including outliers) and cumulative frequency diagrams.
Connect to probability distributions.
Histograms and frequency polygons Box and whisker plots Cumulative frequency diagrams
#2.2.2
Interpret scatter diagrams and regression lines for bivariate data, including recognition of scatter diagrams which include distinct sections of the population (calculations involving regression lines are excluded).
Students should be familiar with the terms explanatory (independent) and response (dependent) variables.
Use to make predictions within the range of values of the explanatory variable and the dangers of extrapolation. Derivations will not be required. Variables other than x and y may be used.
Use of interpolation and the dangers of extrapolation. Variables other than \(x\) and \(y\) may be used.
Change of variable may be required, e.g. using knowledge of logarithms to reduce a relationship of the form \(y = ax^n\) or \(y = kb^x\) into linear form to estimate \(a\) and \(n\) or \(k\) and \(b\).
Understand informal interpretation of correlation.
Use of terms such as positive, negative, zero, strong and weak are expected.
Understand that correlation does not imply causation.
#2.2.3
Interpret measures of central tendency and variation, extending to standard deviation.
Data may be discrete, continuous, grouped or ungrouped. Understanding and use of coding.
Measures of central tendency: mean, median, mode.
Measures of variation: variance, standard deviation, range and interpercentile ranges.
Use of linear interpolation to calculate percentiles from grouped data is expected.
Be able to calculate standard deviation, including from summary statistics.
Students should be able to use the statistic \(S_{xx}\)
\(S_{xx} = \displaystyle\sum{(x-\bar{x})^2} = \displaystyle\sum{x^2} - \dfrac{\big(\sum{x}\big)^2}{n}\)
Use of standard deviation \(= \sqrt{\dfrac{S_{xx}}{n}}\) (or equivalent) is expected but the use of \(= \sqrt{\dfrac{S_{xx}}{n-1}}\) (as used on spreadsheets) will be accepted.
Mean, mode and median Variance and standard deviation Range and interquartile range Linear interpolation Coding data
#2.2.4
Recognise and interpret possible outliers in data sets and statistical diagrams.
Any rule needed to identify outliers will be specified in the question. For example, use of \(Q_1 - 1.5 × IQR\) and \(Q3 + 1.5 × IQR\) or mean \(± 3 ×\) standard deviation.
Select or critique data presentation techniques in the context of a statistical problem.
Students will be expected to draw simple inferences and give interpretations to measures of central tendency and variation. Significance tests, other than those mentioned in Section 5, will not be expected.
Be able to clean data, including dealing with missing data, errors and outliers.
For example, students may be asked to identify possible outliers on a box plot or scatter diagram.
Outliers and cleaning data Select or critique data presentation techniques