Assessment is used less as a tool for improvement than as a manacle for accountability, straitjacketing teaching and shackling learning.
‘Everybody has won, and all must have prizes’
They began running when they liked, and left off when they liked, so that it was not easy to know when the race was over. When they had been running for half an hour or so, the Dodo called out ‘The race is over!’ and they all crowded round, panting and asking, ‘But who has won?’
This question the Dodo could not answer without a great deal of thought, and sat for a long time with one finger pressed against his forehead, while the rest waited.
At last the Dodo announced: ‘Everybody has won, and all must have prizes.’
When I was doing my GCSEs, I was always a bit puzzled as to why we never got our exam papers back. It seemed to leave me blind as to how to improve. Then, when I first became a teacher and started assessing students’ work, I felt a bit like the Dodo. I’d look from the assessments to the criteria in bemusement. I’d agonise about which level to label a piece of work with. I’d gingerly assign one, but if I relooked later, I’d often come up with a completely different level. But as everyone had to move onwards and upwards, I tended to err towards optimism, as if all should get prizes for progress.
Last week I posted on formative assessment, and how it got hijacked by gimmicks. This week I want to write about how the topsy-turvy world of summative assessment straitjackets teaching and shackles learning.
What’s assessment for?
Assessment is one of the most complex issues in education. Fundamentally, there’s no agreement on what it’s for. Tim Oates at Cambridge Assessment has enumerated about 80 separate purposes that assessment data are used for. Broadly, they include diagnosing students’ strengths and weaknesses in subjects; evaluating whether students have learned what teachers have taught; judging what students are learning more broadly; challenging the brightest students whilst allowing the weakest access; deciding whether to award qualifications and distinction; reporting to parents on children’s progress; comparing departments; benchmarking between schools; inspecting the quality of teaching and achievement in schools; monitoring performance locally; publishing data to hold the government accountable for public spending… the purposes of assessment are almost innumerable; many of them are conflicting. The more purposes an assessment has, the more strain is placed on its design: GCSEs, for instance, encompass several. The crux is the tension between assessment for improvement and for accountability.
The assessment for accountability regime gets teachers fixated on exam drills, hooked on levels, ratcheting up ladders and bogged down in bureaucracy, all of which straitjacket teaching, shackle learning, and crowd out important formative assessment.
The high-stakes exam system and league table metrics exert inordinate pressure on teachers. This anecdote from an English teacher, which will seem familiar to many, reveals the pervasive, chronic short-termism of drilling in exam technique and timing that the system forces us into:
My year 11 pupils couldn’t have told you the name of a poet, but they could tell you that section A of paper 1 assessed your reading skills, that it involved 4 questions, that each question had 10 marks, that question 3 nearly always dealt with the writer’s techniques, that they should spend about 10 minutes reading the passage and then 10 minutes answering each question that followed, which left about 70 minutes for section B, which assessed writing and involved two tasks, both of which were worth 20 marks, but the first of which you should only spend 25 minutes on, leaving 45 for the piece of creative writing, which in any case you would write beforehand, memorise and regurgitate in the exam so you didn’t have to waste any time thinking and structuring a story from scratch.
It’s not just at GCSE that teachers are affected, but also at Key Stages 2 and 3. The 2011 Lord Bew report on assessment and accountability took 12 weeks, 4000 online responses and 50 interviews. Respondents with ‘considerable and significant concerns’ criticised National Curriculum levels as ‘too broad, inconsistent across Key Stages, not specific enough about a pupil’s attainment in any given subject and difficult to interpret, including for parents.’ The counter-intuitive consequences of the high-stakes accountability system, the report concluded, was pervasive teaching to the test, placing a ceiling on attainment of pupils, and impeding their progress.
It seems, however, as if we’re all addicted to grades and levels. Dylan Wiliam uses the colourful analogy that we have our students hooked on them like drug addicts, that we teachers are pushers and parents are co-dependents. Like any addiction, it absorbs attention, temporarily gives gratification, artificially inflates self-esteem and exacerbates the problem it seeks to remedy. Wiliam’s summary of the research is that constantly giving grades actually lowers achievement. Not only that, but giving comments with grades means that students don’t read the comments, as they’re too busy comparing grades. He concludes that grades inhibit learning.
The problem with national levels is that the success descriptors are vague, abstracted, overcomplicated, overlapping, over-generalised, jargon-heavy, vacuous, and almost unintelligible. Try using the above English descriptors to reliably distinguish between pupils’ writing at levels 4, 5 and 6 in assessment focus 2: ‘understand, describe, select or retrieve information, events or ideas from texts and use quotation and reference to text’:
- For a level 4, ‘comments supported by some generally relevant textual reference or quotation’
- For a level 5, ‘comments generally supported by relevant textual reference or quotation, even when points made are not always accurate’
- For a level 6, ‘commentary incorporates apt textual reference and quotation to support main ideas or argument’.
If that impenetrable bureaucratic educationalese makes your brain hurt, imagine what it’s like marking, leveling and moderating 30 essays using sub-level guesstimates across four assessment foci. Three different teachers could assign the same piece of work three different sub-levels; in fact, one teacher might on separate occasions assign three different sub-levels to one piece of work. Validity and reliability depend on the precision and specificity of assessment criteria, and are compromised if they are vague and vacuous.
Numerical levels, though, are easily compared across departments and schools. Parents thirst for them, managers and teachers persist with pushing them, and students become more and more addicted to them.
The linearity of levels has led to the idea that a students’ grade or level should never go down. Pupils, parents and managers complain unless there’s continual upward progression being evidenced by numerical levels, regardless of reliability. But this anecdote illustrates the problem with generic levels when applied across content of varying difficulty:
In a media unit in English, the assessment question was: ‘How does Spielberg create drama and tension in the film Jaws?’ When marking these essays, there were kids who had effectively got to the top of the level grid. They’d analysed the film perfectly. Admittedly they’d spelt some words wrong, but they weren’t being marked on that. They were being marked on how well they could analyse, and I didn’t see how they could have done it better. So what was I supposed to do? Give them a level 8, the top grade you can get at Key Stage 3? But I knew these pupils were not level 8 students. If I gave them a level 8 now, there would be outrage when at the next assessment on Shakespeare they went back to level 5. But if I gave them a level 5, I was at a loss as to what to put for a target. In the end, I gave them level 5 and made up a target that wasn’t on the grid.
Like a game of snakes and ladders, no one wants to be the snake that moves pupils down.
The classic example of how assessment and accountability bogs down teachers in bureaucracy is the last government’s 2008 DCSF £50 million initiative: ‘Assessing Pupil Progress’ (APP). It diagnosed school assessment’s ‘unnecessarily bureaucratic, time-consuming and workload-intensive burdens; and promised to ‘replace existing bureaucratic internal school assessment practices with a more streamlined and purposeful approach’. It had precisely the opposite effect, as another English teacher writes here:
The APP grids came on a double-sided A3 sheet consisting of the 14 English Assessment Focuses, or AFs, broken down by the 8 key stage three national curriculum levels – that’s 112 tiny little boxes of skills targets. I taught 90 key stage three pupils, so that meant 90 double-sided sheets of A3 to update every half term. That was how often we updated the sheets, but there was a suggestion that it should be updated after every piece of assessed work. If that’s called reducing bureaucracy, I’d love to see what increasing it is like.
Because of the imprecision, some schools started assigning national levels to 2 decimal places, giving out levels like 4.45. But this isn’t just anecdotal. From 2008 to 2010, teachers’ Union NASWUT received increasing numbers of reports from teachers about the burdens of APP. It put this down to ‘inappropriate approaches to implementation promoting its use in ways never intended’. It was only ever supposed to be used twice or three times a year, NAWSUT argued. But I think it’s more likely that the entire micro-prescriptive premise of APP was fundamentally flawed in the first place.
The effects of the assessment for accountability regime on learning are imposing, obscuring, weakening and demotivating. It imposes crude and unhelpful student labels: from the ages of 7-14, countless conversations between students compare what level they are: ‘I’m a level 5b: what are you?’ It obscures what students actually know and can do – who knows what 5b actually means in Maths? How strong or weak their grasp of number, algebra, statistics and geometry is obscured. Assigning a ‘best fit’ level across all subject areas, weakens teachers’ and parents’ clear understanding of pupils’ specific weaknesses or misunderstandings. Worst of all, expected national progress of 2 national levels from Years 7-9 is pretty demotivating for students. What if students don’t progress as ‘expected’? ‘ Ellie, at the start of Year 7 you were on a level 5. By the end, you’re on a level 5. Well done’. It’s not very motivating, is it? Assessment for accountability even fails on its own terms: as Tim Oates points out: ‘generalised reporting using levels obscures the fact that too great a proportion of pupils fail to attain elements of the curriculum that are vital for the next phase of their education’.
So why do we still use levels?
If levels are so evidently counter-productive for learning, why do so many schools still use them? In June 2012 the DfE took the decision to scrap national levels and not replace them in its letter to the expert panel on the National Curriculum Review. So why haven’t more schools taken the chance to do away with them?
In a word, accountability. Schools are locked into levels because OFSTED inspections and data dashboards measure progression in levels; league tables measure progression from baseline levels; and SLT, perversely incentivised by these metrics, enforce levels on teachers. Even if a school somehow devised its own system of progression, how would it benchmark?
The 2011 Bew report stated that ‘in the short term, we believe we need to retain levels as a means of measuring pupils’ progress and attainment. Key Stage 1 continues to be reported by levels, and therefore to measure progress robustly, Key Stage 2 results should be reported in the same way’. This sounds identical to the argument for using levels at KS3: that KS2 data is measured in levels, so there’s no other option.
With the focus on measurement, benchmarking and accountability, the logic seems inescapable. But such regressive logic reminds me of the start of Stephen Hawking’s A Brief History of Time, when he mentions a conversation Bertrand Russell had after a lecture he gave on the universe. An old lady came up to him, and said, ‘Rubbish. The world is really a flat plate on the back of a giant tortoise.’ When he asked what the tortoise was standing on, she replied: ‘Very clever, young man, very clever. But it’s turtles all the way down!’ It sometimes seems as if the education system yields to none in embracing the logic of the Dodo.
“It’s turtles all the way down”.
Mind you, it’s easier to criticise than to propose a credible alternative. But it’s also important to diagnose before you prescribe. If this post diagnoses the extent of the problem, the next tries to envisage an alternative assessment regime. If Wiliam is right and we’re addicted to levels, as my friend Harry said: we’re going to need a methadone.