Specifications that use this resource:

Notes and guidance: large data set

Our new AS and A-level Maths specifications require students to study a large data set during their course of study. The data set is chosen by each exam board, based on Ofqual guidance.

The exams will include questions or tasks that relate to the prescribed large data set, giving a material advantage to students who have studied it.

The large data set is too large to be taken into an exam. Instead, we recommend using the large data set as a classroom tool to support teaching the statistics content of the specification and to familiarise students with working with and manipulating data. Basic knowledge of spreadsheet packages such as Microsoft Excel or Geogebra is required.

Techniques for studying the large data set

Study of the large data set could include the following techniques:

sampling
histograms
scatter graphs and correlation (not causation)
measures of central tendency and spread (standard deviation)
data cleansing
select and critique different presentation techniques
probability: exclusive and independent events
brief interpretation of the data in order to answer short questions
deep interpretation of the data using given graphs and summaries
selecting from given graphs and summary data
modelling with trend lines for bivariate data
modelling with distributions and hypothesis testing
describing a situation where data needed to be collected and how it might be done
using and interpreting correlation coefficients (A-level only).

Students should be prepared for exam questions that require knowledge of any of the above in an exam.

Material advantage questions

Examples of questions that give a material advantage to students who have studied the large data set can be found in the sample assessment materials for AS (Paper 2, questions 14 and 16(b)) and A-level (Paper 3, questions 10(a) and 10(c)).

In answering these questions, students would have gained a material advantage through:

understanding the categories and sub-categories that the large data set uses
understanding how values in the large data set are rounded
knowledge of trends in the data
knowledge of outliers and other anomalies in the data.