# Demystifying Statistics

## Connecting key elements of statistics to the NHS workplace

A one-day course that provides NHS information analysts with an overview of how to apply basic statistical techniques to their work.

**Demystifying Statistics** will suit information and performance analysts who want to learn—or who want to refresh their learning—about how key elements of the basic statistics syllabus can be applied to NHS problems and situations.

By the end of the course participants will understand the meaning of—and be able to calculate—standard deviation and standard error. They will be able to apply that knowledge in the calculation and drawing of confidence intervals. They will understand the concepts behind the null hypothesis and be able to calculate, interpret and explain P-values. They will understand when to use correlation and scatterplots as a way of describing relationships between variables.

No prior knowledge of statistics is assumed. However, you will need a basic grasp of Microsoft Excel to complete the exercises.

**Session 1 / Standard Deviation and the Normal Distribution
**

1.1 How to explain standard deviation to a layperson

The first cornerstone of the course is that we have to establish a firm grasp of the concept of standard deviation. Using examples from NHS data we explore several datasets and examine what the standard deviation means in each one.

1.2 Calculating standard deviation the long and short way in Excel

It's not enough to be able to just invoke Microsoft Excel's =STDEV function; we need to understand standard deviation "properly", so we practice calculating it "the long way round" by working through the formula step by step.

1.3 Understanding the *properties* of standard deviation

On its own, used simply as a measure of dispersion, standard deviation has limited use. However, as soon as we begin to understand how standard deviation "works" in relation to the Normal distribution, many possibilities (which we explore during the rest of the course) are opened up to us.

1.4 Creating and drawing frequency distributions using healthcare data

Many of the statistical "tests" that we use in the rest of the course depend on the underlying distribution of the data being approximately Normal, so it's important that we know how to quickly create, draw and assess distributions of our datasets in Excel.

**Session 2 / Standard Error and Confidence Intervals**

2.1 Moving from standard deviation to standard error

We begin the second session with an explanation of the concept of standard error, its close relationship to standard deviation, and what it means in relation to estimates based on random samples.

2.2 Calculating standard error in Excel

We practice using the formulae for standard error in Microsoft Excel, first for parametric data and then for non-parametric data.

2.3 Calculating confidence intervals with parametric and non-parametric data

We learn how to calculate confidence intervals for data based on sample estimates. For example, estimating the average length of stay of 2,000 inpatients from a random sample of 60 patients, or estimating the percentage of 5,000 Medicine of the Elderly inpatients aged over the age of 85 based on a random sample of 60 patients.

2.4 Using confidence intervals to test for significant difference

We also devote time (this section is arguable the most important part of the course) to showing how to calculate confidence intervals for the *difference* between two means or proportions.

**Session 3 / Hypothesis Tests and P-Values**

3.1 Understanding the null hypothesis

If we're going to learn how to calculate P-values, we need to know what we're calculating the probability of, exactly. So we begin this session with a discussion of how to grasp and explain what a P-value is.

3.2 How to choose a hypothesis test

Which hypothesis test you use depends on the type of data you've got. We take a quick tour through the range of tests, and discuss why they are appropriate for diffenret types of data.

3.3 Practicing with the large sample normal test

The practical element of this session concentrates on the large sample normal test. Using the same datasets that we used for the confidence interal exercises, we show how to calculate P values.

3.4 Quoting and interpreting P-values

As if to underline the importance of the words and phraseology we use when describing statistical significance, we devote time at the end of the exercise to the question of how we verbally describe our P values, and what they mean in terms of inference.

**Session 4 / Correlation**

4.1 Association is not the same as causation

Conventional statistics courses often "warn" students of the dangers of confusing association and causation, with the result that analysts resolve never again to touch correlation and scatterplots with even a barge-pole. This course takes a different approach: we actively encourage analysts to explore relationships between variables, as long as they make clear that the analysis is explorotary.

4.2 When and how to use the coefficient of correlation (*r*)

We spend time practicing how to calculate *r* for different healthcare datasets, showing instances when it is—and when it is not—appropriate to do so.

4.3 Visualizing relationships using scatterplots

When it comes to communicating relationships between variables, scatterplots are a highly effective visual method to use, and we spend time practicing the creation and and design of scatterplots.

NHS analysts don't just need a grasp of basic statistics; they also need to know how to apply that statistical know-how in order to help them solve the kinds of problems they are confronted with each day in the workplace. Statistics courses often fail to address this training need because their examples are unrelated to participants' workplace situations. **Demystifying Statistics**, however, is entirely given over to mainstream NHS examples, and it teaches how to perform the calculations in Excel.

**Demystifying Statistics** has grown out of our initial consultations with NHS information managers about the training needs of analysts. A common thread that emerged from those conversations was that analysts need to be fluent in the key statistical concepts such as standard deviation, standard error, confidence intervals, P-values and correlation.

But it has now developed into much more than that. **Demystifying Statistics** is a course that addresses a training need that runs right down the middle of what NHS information analysts do, and that is to do with the task of comparison.

Nearly every piece of analysis ever done by an NHS information analyst is about comparing things. This year with last year; this hospital with that hospital; this month's performance with target performance. But we often fail to adjust the comparison so that we have a level playing-field. In other words, we hardly ever standardize our data (for age, sex, social deprivation, whatever); and we hardly ever adjust for random fluctuation.

It is this need to inject intelligence into our comparison data analysis that now underpins **Demystifying Statistics**.

**Demystifying Statistics**can be booked as an on-site workshop for £1,250+VAT, and up to 12 participants can be accommodated in each workshop session. A degree of familiarity with Microsoft Excel is helpful for this training course.