Commentary

Are humanitarian evaluations fit for purpose?

Evaluations can play a key role in capturing learning from humanitarian action. But how well do they actually do this – and how does this relate to their accountability function?

Susanna Morrison-Metois, ALNAP’s Senior Research Fellow for Evaluation and Learning, caught up with James Darcy, co-author of ALNAP's discussion paper Missing the point?, on the state of evaluation practice, the role of independent evaluation and the need to think beyond standard evaluation approaches.

Susanna: In the paper, you raise questions about ‘accountability’ and ‘learning’ in the context of humanitarian evaluations. What are your main concerns here?

James: In thinking about the purpose of evaluations, we generally take accountability and learning as ‘given’. But we think too little about what these actually mean, how they are best achieved, how they are linked – and whether evaluations actually promote them. From our review of published evaluations, we found that current evaluation practice often appears to deliver on neither of these agendas very well.

In our paper, we argue that accountability and learning are linked by what we call the validation function of independent evaluation. This involves testing and, where necessary, challenging prevailing organisational narratives and assumptions about the value of a given intervention. We believe this is a key function of independent evaluation – one that is under-recognised – and it forms a bridge between accountability and learning. The ‘test and challenge’ element clearly has an accountability dimension. But in our view, it is also central to the learning agenda, and in fact may be an essential condition for genuine organisational self-reflection.

Susanna: Isn’t this a rather confrontational view of evaluation?

James: No, I really don’t believe so. Evaluation can’t succeed if it is confrontational. The point of evaluation is not fault-finding: it is to assess from an external perspective the value of an intervention and the way it has been done, mainly in order to promote internal re-assessment and inform future decisions. Crucially, validation as it is understood here should be a joint endeavour. I think the idea is to hold up a mirror to an organisation and ask it to reflect on what it sees, good or bad. Of course, the organisation has to recognise itself in the mirror – and that is part of the challenge for an evaluator. The mirror has to present a ‘true’ reflection. But if it is to be useful, evaluation also has somehow to show something more than surface appearance, which generally means going beyond a technical critique of a given programme and helping an organisation see its intervention in a wider and deeper perspective.

A good evaluation process itself should aim to promote self-reflection. Interviews in which staff are invited to reflect on their experience are part of this; and triangulation with other sources of evidence and experience (an essential part of validation) allows comparison with other points of view. The presentation and discussion of evaluation findings with staff is itself part of the process of validation, providing an opportunity to re-assess collectively and to look forward, using the evaluation findings as a springboard.

"A programme cannot usefully be evaluated purely in its own terms. We need to think about what an agency is doing (or has done) in context; and to assess what is not being done as well as what is."

Independent evaluation can introduce perspectives that may be under-represented or missing from internal discussions and is also likely to introduce elements of complexity that provide a more nuanced picture.
Apart from its basic accountability function, it is mainly in this sense that external evaluation differs from more internal learning processes, even those that have some external facilitation. Both have their place; and current efforts at learning from the response to COVID-19 provide an opportunity to think further about the kinds of process most likely to promote genuine organisational learning.

Overall, while evaluations are sometimes too wide in scope, I think they tend to have too narrow a frame of reference. A programme cannot usefully be evaluated purely in its own terms. We need to think about what an agency is doing (or has done) in context; and to assess what is not being done as well as what is.
In evaluating against the standard criteria (effectiveness, relevance etc.), evaluations need to consider agendas like support to affected communities’ own initiatives; programme quality and responsiveness to change; and the real added value of an intervention in the particular context. But evaluators also have to engage with the messy reality that aid agencies face in practice, not with some idealised version of the world. Evaluations that do not understand the (often huge) challenges that decision-makers are faced with are not likely to be either useful or fair.

Susanna: Are the accountability and learning agendas in tension with each other? And if so, how should that tension be managed?

James: There are potential tensions here. Organisations and individuals are naturally under some pressure to show that interventions have been (essentially) successful and that they have provided substantial added value to a wider crisis response. This is partly a matter of organisational incentives, about an agency’s reputation and future funding; but it also has strong psychological dimensions, individual and collective. Shared narratives are an important part of the ‘glue’ that holds organisations and teams together and helps establish their identity. But they can also be misleading or incomplete.

Evaluations by definition pose a potential challenge to such narratives and beliefs about value and this can act as a disincentive to genuine self-reflection. Enabling a process of reflection depends on leadership from managers and the existence of an open learning culture in the agency concerned.

Managing these tensions involves reaching an understanding about the necessary conditions for satisfying both agendas (accountability and learning) within the same process. For example, accountability may demand that the terms of reference should allow evaluators to raise their own questions (within the overall scope of the evaluation) as well as addressing those that have been pre-defined. But this can also be seen as a condition for a genuinely open learning agenda, perhaps linked to a defined process of policy review. Independent evaluators can play a unique role as catalysts for reflection and change, but the reaction in question has to happen within the organisation(s) concerned. Without that, external evaluation loses much of its value.

Susanna: How does all this relate to the generation of useful and reliable evidence for decision-making?

James: We found that the quality of evaluations in our study sample was highly variable from an evidential standpoint, depending largely on the quality and depth of the evidence relied upon and the cogency of the related analysis. This in turn depended partly on how well the evaluation function was resourced and supported. Evaluations tend to rely heavily on compiling and reviewing secondary or indirect evidence about a given intervention, much of it generated by the organisation concerned; and they are usually only able to assess primary (direct) evidence to a limited extent. The main evidence generated by evaluations is a set of judgments about value and performance, judgements that are themselves more or less well grounded in – and justified against – evidence.

"Fully assessing the value and effect of an intervention means going deeper than technical performance and may require non-evaluative as well as standard evaluation approaches. This might include behavioural and socio-economic surveys or studies – e.g., about household responses to economic or security threats, and the role played by aid interventions – that may require a more longitudinal approach."

Evaluations can generate evidence of different kinds, but we should not overload them with expectations in this regard. They are only one of the possible sources of evidence. Fully assessing the value and effect of an intervention means going deeper than technical performance and may require non-evaluative as well as standard evaluation approaches. This might include behavioural and socio-economic surveys or studies – e.g., about household responses to economic or security threats, and the role played by aid interventions – that may require a more longitudinal approach. Evaluations cannot reasonably be expected to substitute for such studies. In other words, evaluation in a fuller sense may require a broader approach, and one that is not so dependent on a single ‘point in time’ enquiry. There is a conceptual link here to the monitoring function that we explore in the Missing the point paper.

In the paper, we raise particular questions about the value of evidence from evaluations concerning effectiveness, perhaps the core evaluation criterion. To what extent (and how) has an intervention worked? There is usually both a relatively superficial and a deeper answer to this question. We conclude in the paper that the argument for effectiveness and positive impact is made with too little reference to evidence about actual effects in the world. Here again I think the link with other sources of evidence (including evidence from monitoring) is essential: we should not expect too much from evaluation alone. But this is a big subject and maybe a conversation for another day!

Explore more M&E Resources

Find out more