What is Assessment for Learning? It would be quite reasonable for any teacher to ask, with a degree of puzzlement, why something called Assessment for Learning(AfL) has moved centre stage in the drive to improve teaching and learning. The past experience of many teachers, pupils and their parents has been of assessment as something that happens after teaching and learning. The idea that assessment can be an integral part of teaching and learning requires a significant shift in our thinking but this is precisely what Assessment for Learning implies. So, before we look at what research on AfL can tell us, it is important to understand what it is.
The nature of assessment
It is no accident that the word ‘assessment’ comes from a Latin word meaning ‘to sit beside’ because a central feature of assessment is the close observation of what one person says or does by another, or, in the case of self-assessment, reflection on one’s own knowledge, understanding or behaviour. This is true of the whole spectrum of assessments, from formal tests and examinations to informal assessments made by teachers in their classrooms many hundred times each day. Although the form that assessments take may be very different – some may be pencil and paper tests whilst others may be based on questioning in normal classroom interactions – all assessments have some common characteristics. They all involve: i) making observations; ii) interpreting the evidence iii) making judgements that can be used for decisions about actions.
In order to carry out assessment, it is necessary to find out what pupils know and can do or the difficulties they are experiencing. Observation of regular classroom activity, such as listening to talk, watching pupils engaged in tasks, or reviewing the products of their classwork and homework, may provide the information needed, but on other occasions it may be necessary to elicit the information needed in a very deliberate and specific way. A task or test might serve this purpose but a carefully chosen oral question can be just as effective. Pupils responses to tasks or questions then need to be interpreted. In other words, the assessor needs to work out what the evidence means.
Interpretations are made with reference to what is of interest such as specific skills, attitudes or different kinds of knowledge. These are often referred to as criteria and relate to learning goals or objectives. Usually observations as part of assessment are made with these criteria in mind, i.e. formulated beforehand, but sometimes teachers observe unplanned interactions or outcomes and apply criteria retrospectively. Interpretations can describe or attempt to explain a behaviour, or they can infer from a behaviour, e.g. what a child says, that something is going on inside a child’s head e.g. thinking. For this reason interpretations are sometimes called inferences.
On the basis of these interpretations of evidence, judgements are made. These involve evaluations. It is at this point that the assessment process looks rather different according to the different purposes it is expected to serve and the uses to which the information will be put.
A distinction between formative and summative (summing-up) purposes has been familiar since the 1960s although the meaning of these two terms has not been well understood. A more transparent distinction, meaning roughly the same thing, is between assessment of learning, for grading and reporting, and assessment for learning, where the explicit purpose is to use assessment as part of teaching to promote pupils’ learning. AfL becomes ‘formative’ when evidence is actually used to adapt teaching and learning practices to meet learning needs. AfL came to prominence, as a concept, after the publication in 1999 of a pamphlet with this title by the Assessment Reform Group, a small group of UK academics who have worked, since 1989, to bring evidence from research to the attention of teachers and policymakers.
Assessment for learning
In AfL, observations, interpretations and criteria may be similar to those employed in assessment of learning, but the nature of judgements and decisions that flow from them will be different. In essence, AfL focuses on what is revealed about where children are in their learning, especially the nature of, and reasons for, the strengths and weaknesses they exhibit. AfL judgements are therefore concerned with what they might do to move forward.
The Assessment Reform Group (2002a) gave this definition of assessment for learning:
Assessment for Learning is the process of seeking and interpreting evidence for use by learners and their teachers to decide where the learners are in their learning, where they need to go and how best to get there.
One significant element of this definition is the emphasis on learners’ use of evidence. This draws attention to the fact that teachers are not the only assessors. Pupils can be involved in peer- and self-assessment and, even when teachers are heavily involved, pupils need to be actively engaged. Only learners can do the learning so they need to act upon information and feedback if their learning is to improve. This requires them to have understanding but also the motivation and will to act. The implications for teaching and learning practices are profound and far-reaching.
Assessment of learning
In contrast, the main purpose of assessment of learning is to sum-up what a pupil has learned at a given point. As such it is not designed to contribute directly to future learning although high-stakes testing can have a powerful negative impact (Assessment Reform Group, 2002b). In assessment of learning the judgement will explicitly compare a pupil’s performance with an agreed standard or with the standards achieved by a group of pupils of, say, the same age. The judgement may then be in the form of ‘has/has not’ met the standard or, more usually, on a scale represented as scores or levels. These are symbolic shorthand for the criteria and standards that underpin them. Representation in this concise, but sometimes cryptic, way is convenient when there is a need to report to other people such as parents, receiving teachers at transition points, and managers interested in monitoring the system at school, local and national level. Reporting, selection and monitoring are therefore prominent uses of this kind of assessment information.
Can summative data be used formatively?
Scores and levels, especially when aggregated across groups of pupils, are often referred to as ‘data’ although any information, systematically collected, can be referred to in this way. Aggregated summative data are useful for identifying patterns of performance and alerting teachers to groups that are performing above or below expectations. However, schools need to investigate further if they are to discover the reasons for these patterms in order to plan what to do. Similarly, at the level of the individual pupil, summative judgements are helpful in indicating levels of achievement and, by implication, the next levels that need to be aimed for if learners are to make progress. However, scores and levels need to be ‘unpacked’ to reveal the evidence and criteria they refer to if they are to make any contribution to helping pupils to take these next steps. What is important is the qualitative information about the underlying features of a performance that can be used in feedback to pupils. For example, telling a child that he has achieved a Level 4 will not help him to know what to do to achieve a Level 5, although exploring with him the features of his work that led to this judgement, and explaining aspects of it that he might improve, could help him to know what to do to make progress. In this context the summative judgement (in number form) is stripped away and the teacher goes back to the evidence (observation and interpretation) on which it was made. She then makes a formative judgement (in words) about what the evidence says about where the learner is, where he needs to go, and how he might best get there.
By changing the nature of the judgement, assessments designed originally for summative purposes may be converted into assessments for learning. However, not having been designed to elicit evidence that will contribute directly to learning, they may be less suited to that purpose than assessments designed with AfL in mind. External tests are even more problematic than summative teacher assessments, because teachers rarely have access to enough of the evidence on which scores and levels are based, although analyses of common errors can be useful.
What does research say about how to improve Assessment for Learning? The key text is the review of research by Paul Black and Dylan Wiliam (1998a, 1998b) which was commissioned by the Assessment Reform Group. This reviews research carried out across the world, in many sectors of education and subject areas, from 1987 to 1997. It refers also to two previous reviews of research. The summary below draws on this work, and adds some insights from studies carried out since 1998.
Black and Wiliam analysed 250 studies of which 50 were a particular focus because they provided evidence of gains in achievement after ‘interventions’ based on what we might now call AfL practices. These gains, measured by pre- and post- summative tests, produced standardised effect sizes of between 0.4 and 0.7. An effect size of 0.4 would move the average student up half a level at Key stage 2; an effect size of 0.7 would move them up three-quarters of a level. There was evidence that gains for lower-attaining students were even greater. These findings have convinced many teachers and policy makers that AfL is worth taking seriously.
The innovations introduced into classroom practice involved some combination of the following:
Asking questions, either orally or in writing, is crucial to the process of eliciting information about the current state of a pupil’s understanding. However, questions phrased simply to establish whether pupils know the correct answer are of little value for formative purposes. Pupils can give right answers for the wrong reasons, or wrong answers for very understandable reasons. Superficially ‘correct’ answers need to be probed and misconceptions explored. In this way pupils’ learning needs can be diagnosed.
Recent research in science education, by Millar and Hames (2003), has show how carefully designed diagnostic ‘probes’ can provide quality information of pupils’ understanding to inform subsequent action. The implication is that teachers need to spend time planning good diagnostic questions, possibly with colleagues. Pupils can be trained to ask questions too, and to reflect on answers. They need thinking time to do this, as they do to formulate answers that go beyond the superficial. Increasing wait-time, between asking a question and taking an answer, from the average of 0.9 of a second, can be productive in this respect. So can a ‘no hands up’ rule which implies that all children can be called upon to answer and that their answers will be dealt with seriously, whether right or wrong.
All these ideas call for changes in the norms of talk in many classrooms. By promoting thoughtful and sustained dialogue (Alexander, 2004), teachers can explore the knowledge and understanding of pupils and build on this. The principle of ‘contingent teaching’ underpins this aspect of AfL.
Giving appropriate feedback
Feedback is always important but it needs to be approached cautiously because research draws attention to potential negative effects. Kluger and DeNisi (1996) reviewed 131 studies of feedback and found that, in two out of five studies, giving people feedback made their performance worse. Further investigation revealed that this happened when feedback focused on their self-esteem or self-image, as is the case when marks are given, or when praise focuses on the person rather than the learning (Dweck, 1999). Praise can make pupils feel good but it does not help their learning unless it is explicit about what the pupil has done well.
This point is powerfully reinforced by research by Butler (1988) who compared the effects of giving marks as numerical scores, comments only, and marks plus comments. Pupils given only comments made 30% progress and all were motivated. No gains were made by those given marks or those given marks plus comments. In both these groups the lower achievers also lost interest. The explanation was that giving marks washed out the beneficial effects of the comments. Careful commenting works best when it stands on its own.
Another study, by Day and Cordón (1993), found that there is no need for teachers to give complete solutions when pupils ‘get stuck’. Indeed, Year 4 pupils retained their learning longer when they were simply given an indication of where they should be looking for a solution (a ‘scaffolded’ response). This encouraged them to adopt a ‘mindful’ approach and active involvement, which rarely happens when teachers ‘correct’ pupils’ work.
Research also shows how important it is that pupils understand what counts as success in different curriculum areas and at different stages in their development as learners. This entails sharing learning ‘intentions, expectations, objectives, goals, targets’ (these words tend to be used interchangeably) and ‘success criteria’. However, because these are often framed in generalised ways, they are rarely enough on their own. Pupils need to see what they mean, as applied in the context of their own work, or that of others. They will not understand criteria right away, but regular discussions of concrete examples will help pupils’ develop understandings of quality.
In a context where creativity is valued, as well as excellence, it is important to see criteria of quality as representing a ‘horizon of possibilities’ rather than a single end point.. Notions of formative assessment as directed towards ‘closing the gap’, between present understanding and the learning aimed for, can be too restrictive if seen in this way, especially in subject areas that do not have a clear linear or hierarchical structure.
Peer- and self-assessment
The AfL practices described above emphasise changes in the teacher’s role. However, they also imply changes in what pupils do and how they might become more involved in assessment and in reflecting on their own learning. Indeed, questioning, giving appropriate feedback and reflecting on criteria of quality, can all be rolled up in peer and self-assessment. This is what happened in a research study by Fontana and Fernandez (1994). Over a period of 20 weeks, primary school pupils were progressively trained to carry out self-assessment that involved setting their own learning objectives, constructing relevant problems to test their learning, selecting appropriate tasks, and carrying out self-assessments. Over the period of the experiment the learning gains of this group were twice as big as those of a matched ‘control’ group.
The importance of peer- and self-assessment was also illustrated by Frederiksen and White (1997) who compared learning gains of 4 classes taught by each of 3 teachers over the course of a term. All the classes had an evaluation activity each fortnight. The only thing that was varied was the focus of the evaluation. Two classes focused on what they liked and disliked about the topic; the other two classes focused on ‘reflective assessment’ which involved pupils in using criteria to assess their own work and to give one another feedback. The results were remarkable. All pupils in the ‘reflective assessment group’ made more progress than pupils in the ‘likes and dislikes group’. However, the greatest gains were for pupils previously assessed as having weak basic skills. This suggests that low achievement in schools may have much less to do with a lack of innate ability, than with pupils’lack of understanding of what they are meant to be doing and what counts as quality.
From 1999 to 2001 a development and research project was carried out by Black and colleagues (2002, 2003) to test some of these findings in a British context because much of the earlier research came from other countries. They found peer- assessment to be an important complement to self-assessment because pupils learn to take on the roles of teachers and to see learning from their perspective. At the same time they can give and take criticism and advice in a non-threatening way, and in a language that children naturally use. Most importantly, as with self-assessment, peer-assessment is a strategy for ‘placing the work in the hands of the pupils’.
Formative use of summative tests
Black et al’s recent study was of 24 science and mathematics teachers in secondary schools. In the second year, 12 English teachers also joined the project. The giving of regular tests was a familiar part of practice in these contexts, which some teachers were reluctant to relinquish. Attempts were therefore made to convert the practice to more formative purposes. Teachers took time to discuss questions that gave particular difficulty and peer tutoring was used to tackle problems encountered by a minority. Thus teachers and pupils delved beneath the marks and grades to examine the evidence of learning, on which the summative judgements were based, and to find formative strategies for improvement. These researchers argue that although there is evidence of harmful effects of summative assessment on teaching, it is unrealistic to expect teachers and pupils to practise separation between assessment of learning and assessment for learning. So the challenge is to achieve a more positive relationship between the two.
Thoughtful and Active Learners
The ultimate goal of AfL is to involve children in their own assessment so that they can reflect on where they are in their own learning, understand where they need to go next and work out what steps to take to get there. The research literature sometimes refers to this as the processes of self-monitoring and self-regulation. It could also be a description of learning how to learn. (See www.learntolearn.ac.uk for information about a research project linking AFL to learning to learn, which is due to report in 2005.) For this to be effective, children need to become both thoughtful and active learners. They must, in the end, take responsibility for their own learning; the teacher’s role is to help them towards this goal. Assessment for Learning is a vital tool for this purpose.
Note:Asterisked references are short booklets or leaflets designed for busy teachers to read. *Alexander, R. (2004) Towards Dialogic Teaching: rethinking classroom talk. Dialogos UK Ltd.
*Assessment Reform Group (1999) Assessment for Learning: Beyond the black box. University of Cambridge School of Education.
*Assessment Reform Group (2002a) Assessment for Learning:10 Principles, University of Cambridge Faculty of Education.
*Assessment Reform Group (2002b) Testing, Motivation and Learning. University of Cambridge Faculty of Education.
Black, P. and Wiliam, D. (1989a) Assessment and Classroom Learning, Assessment in Education: Principles, Policy and Practice. 5(1): 5-75.
*Black, P. and Wiliam, D. (1998b) Inside the black box: raising standards through classroom assessment. King’s College London, School of Education (now available from NFER/Nelson).
*Black, P., Harrison, C., Lee, C., Marshall, B. and Wiliam, D.(2002) Working inside the black box: Assessment for learning in the classroom. King’s College London, Department of Education and Professional Studies (now available from NFER/Nelson).
Black, P., Harrison, C., Lee, C., Marshall, B. and Wiliam, D.(2003) Assessment for Learning: Putting it into practice. Maidenhead, Open University Press.
Butler, R. (1988) Enhancing and undermining intrinsic motivation: the effects of task-involving and ego-involving evaluation on interest and performance, British Journal of Educational Psychology, 58: 1-14.
Day, J. and Cordon, L (1993) Static and Dynamic measures of ability: an experimental comparison, Journal of Educational Psychology, 85: 76-82.
Dweck. C. (1999) Self-Theories: Their role in motivation. personality and development. Philadelphia: Psychology Press.
Fontana, D. and Fernandes, M. (1994) Improvements in mathematics performance as a consequence of self-assessment in Portuguese primary school pupils, British Journal of Educational Psychology, 64: 407-417.
Frederiksen, J. and White, B. (1997) Reflective assessment of students’ research within an inquiry-based middle school science curriculum. Paper presented at the Annual Meeting of the AERA, Chicago.
Kluger, A. and DeNisi, A. (1996) The effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory, Psychological Bulletin, 119: 254-284. *Millar, R. and Hames, V. (2003) Towards Evidence-based Practice in Science Education 1: Using diagnostic assessment to enhance learning. Teaching and Learning Research Programme Research Briefing No. 1, University of Cambridge Faculty of Education.
Assessment Reform Group
King’s College London Assessment for Learning Group
ESRC TLRP Learning how to Learn Project
Association for Achievement and Improvement Through Assessment