top of page

In vogue and at odds: systemic change and new public management in development

Updated: Aug 6, 2018


Two concurrent but incompatible trends have emerged in development in recent years. Firstly, EBP and the results agenda have come become ubiquitous amongst government policymakers in recent years including in development. Secondly, there has been a realisation of the utility of systemic approaches to development policy and programming in order to bring about sustainable change for larger numbers of people. This paper highlights the negative impacts of this former trend on development and, more acutely, its incompatibility with the latter trend. The paper then highlights positive signs of a change in thinking in development that have begun to emerge to lead to a more pragmatic and contextually nuanced approach to measuring progress and identifies the need for further research in this area, calling for evaluation of approaches rather than searching for a silver bullet. The paper draws on a review of the evidence together with a number of key informant interviews with practitioners from the field.


Evidence-based Policy (EBP) and the results agenda have assumed orthodoxy in recent years throughout the public sector and in the last decade this transition has been particularly rapid in the field of development. A concurrent trend in development has been the recognition of the value in a more systemic approach to development issues; simultaneously addressing a complex range of interlinked problems to bring about sustainable change for greater numbers of poor people. The conflict that arises from the coexistence of these trends has yet to be tackled in either literature or practice. This paper aims to formally identify these trends, the incompatibilities of their parallel employment, and set an agenda for addressing this conflict.

The paper contributes to an on-going discussion in development about the uses and misuses of results and evidence. It is divided into four main sections addressing two distinct gaps in the literature.

Firstly, a brief history of EBP and the results agenda and how they are realised in development is provided, bringing together the identification of this trend in the literature across several fields.

Secondly, the impact of this on development policies, programme design, and outcomes, is examined, using the analogy of evidence-based medicine to illustrate the incompatibilities of the origins and current manifestations of the results agenda.

Thirdly, the trend towards more cohesive, complementary development interventions aimed at sustainable and scalable change in a system is outlined. The particular ways in which the move towards a more prescriptive results agenda and EBP is inconsistent with this parallel trend are then examined. Here, case studies are used to draw together the preceding sections of the paper in illustrating how programme design has been influenced by the EBP in systemic change programmes and the challenges such programmes have in compliance.

The concluding section then draws together some reflections on the paper and suggests how the logical inconsistencies examined might be reconciled. The paper recognises the progress made in some areas by some agencies and calls for a research agenda covering how evaluation thinking and methodologies can be developed to capture systemic change.

The methodology used for the paper consists primarily of an examination of the literature but also engagements in the field and with policy and programme development, including some case study data from key informant interviews. The combination of primary and secondary data helps in identifying trends but it should be noted that the degree to which these trends have been realised is by no means universal across development actors or subfields.

EBP and the Results Agenda

‘Results’ and ‘evidence’ terminology is often fluid and confused. For the purposes of this paper, results are used to assess the success or failure of given activities, while evidence is employed strategically to demonstrate the case for alternative forms of intervention.

Aspects of EBP can now be found across the public and private sectors, across academic disciplines and anywhere that management decisions have to be made and, indeed, evidence of EBP and the results agenda is far stronger than the evidence for it (Nutley et al., 2000; Black, 2001; Solesbury, 2001; Pawson, 2004b). New Labour’s 1997 manifesto slogan ‘what counts is what works’, which typifies the utilitarian turn in policy making, is far more evident in modern policy discourse and practice than examples of research counting whether ‘what works’ still ‘works’ and whether it works in other contexts. The slogan perpetuates perceptions of a silver bullet approach to public policy, belying the often complex systems underpinning social change.


Any perception that results and evidence as tools in management and planning are new phenomena is false. A long and cyclical history, formally identified in the literature as at least two centuries old, includes payment by results (Rapple, 1994) and ‘Value for Money’ (VFM) (Madaus et al., 1987; Eyben, 2012). Evidence discourse is of a similar age, originating in experimental medicine (Bernard, 1957), but ‘evidence-based medicine’, and its tool of choice - the Randomised Controlled Trial (RCT), were not developed until the 1970s (Sackett et al., 1996). The 1960s saw a revival of the results agenda in Anglophone countries, reflective of a broader positivist turn in the social sciences (Espeland, 1997; Johnsen, 2005).

The emergence of New Public Management (NPM) in the 1970s throughout the developed world remains a paradigm of public sector management whose influence, discourse, and practices have shifted, strengthened, and weakened throughout that time (Hood, 1991; Hopwood and Miller, 1994; Gray and Jenkins, 1995; Power, 1997; Hood and Peters, 2004). The global political polarisation of the 1970s and 1980s meant that the influence of ‘evidence’ in policy was secondary to the influence of ideology. Trade with certain countries was undesirable if it would have occurred at the expense of what was right. The Anglo-American shift to the left of the 1990s, together with the end of the cold-war resulted in the promotion of a perceived objectivity and pragmatism in policy-making, of which EBP was a major part (Black, 2001; Solesbury, 2001; Parsons, 2002; Sanderson, 2003; Pawson, 2004a). Since this reinvigoration of NPM in the mid-1990s, the trend has increased in its strength of application within fields and diversified in its application to new fields of public policy (Ferlie, 1996; Pawson, 2004b; Haque, 2007; Lapsley, 2009).


The rationales for results and evidence are clear and distinct. Results are used to ensure that people, programmes, assets etc. are performing to the required level in a defined and measurable way. Evidence is used to ensure that empirically derived knowledge is obtained on which of a range of alternative strategies for performing a given task is most effective, according to a pre-defined set of objectives, so that this strategy can be replicated and, if necessary, adapted.

There are three primary reasons for the collection of evidence and results. Firstly, accountability – while efficient markets hold the private sector to account for performance, a lack of information on performance may result in inefficiencies in the public sector (Minogue et al., 1998; Barzelay, 2001; Lapsley, 2009). Secondly, efficacy – given limited resources, it is important to know where money can be spent to ‘do the most good’ (Bovaird and Davies, 2011). Thirdly, learning – determining the impacts of interventions so that lessons can be transferred to alternative contexts (Feinstein and Picciotto, 2000; Hall et al., 2003).

In medicine, from where the drive towards EBP originated, the population to whom the evidence is to be applied is biologically similar if not homogenous. The ‘treatment’ too is universal; each case will receive the same dose of the same medicine, delivered in the same way. Lessons learned are transferable across space and time. Such methods spread toward social policy in the 1990s with the theory that behavioural change, taken across a large enough sample, was equally homogenous as biological change.

A key differentiation between results and evidence in rationale is in the principles underlying how they are employed which relates strongly to Goodhart’s law; once a measure becomes a target it ceases to be a good measure (Elton, 2004). The rationale behind evidence relies heavily on performance being accurately measured and not used as a target whereas the rationale behind the results agenda relies on targets being set and then performance being measured against them.

Current Features in Development

Aid budgets increased dramatically through the 1990s and into the new millennium (OECD, 2013). With increased funding came increased scrutiny and, in the wider context of NPM, this meant development became a more transparent, measured and target-driven sector. A change in public perceptions was, perhaps, the strongest driver behind the change in donor approaches to aid. In the 1980s, aid was characterised as altruistic and compassionate, which fulfilled domestic political objectives of donor governments. Some influential critiques (Easterly, 2002; Easterly, 2006; Clark, 2008) led some to question the efficacy of aid in reducing poverty while the global financial crisis challenged its affordability (Te Velde et al., 2008; Dang et al., 2009).

📷High-level OECD meetings on aid effectiveness were held in Rome (2002) and Paris (2005), leading to The Paris Declaration on Aid Effectiveness of 2005.

The first two of these points should have meant a shift towards direct budgetary support while the final two are of importance to the way EBP has increased in prominence (De Renzio, 2006). Point four formalised results-based management in development (Strathern, 2000).

Practically, the shift to an accountability culture has manifested itself in a variety of ways. At a programme design level the shift has worked in parallel to the role-back of the state meaning the number of directly contracted staff in major donor organisations has been greatly reduced (Easterly and Pfutze, 2008; Mehrotra, 2009; ICAI, 2012) . These mutually reinforcing trends have meant a proliferation of outsourcing in the development process with programme design, implementation, and evaluation all now put out to competitive tender to NGOs, development consultancies, and multi-sector, multinational firms (Domberger and Rimmer, 1994; Stubbs, 2003; Altenburg, 2005; Harland et al., 2005). When combined with the general budgetary support to developing country governments that emerged from the Paris Declaration, direct control of donor staff over the trajectory and success of programmes has been greatly reduced. Indeed, the global financial crisis had the counter intuitive effect of increasing aid from $104bn in 2007 to $134bn in 2011, meaning fewer staff have more money to disburse (OECD, 2013). It is perhaps inevitable, then, that development has succumbed to the trend of ‘management by numbers’.

Current features of the results agenda include the introduction of ‘milestone payments’, incorporated from individual employees through to contract management, whereby a proportion of the fee is put at risk against the achievement of predefined targets. Indeed, in a recent business case for an innovative programme in West Africa, the donor stated that while they were fully aware that these simplistic output targets would not help them achieve broader goals, they were obliged to make payment contingent upon them (Key informant interview with programme manager – herein, empirical observations from development programmes have been anonymised to protect the interests of key informants). As part of the Department for International Development (DfID) Business Plan 2010-2015, the department now publish figures on 15 quantitative measures of performance, including a number of people, an amount of money spent or number of items purchased.

The quest for evidence has, to a certain extent, been employed in place of oversight and technical coordinating functions in development strategy. There has been a drive for a ‘solution’ to every development problem, often referred to as the learning objective and, here, it is the RCT, together with other experimental and quasi-experimental methodologies, that form the primary weapons in the armoury of those seeking to compare impact across development programmes. Indeed, an implicit hierarchy of evidence has emerged. For donor staff with limited time and expertise in a given context, numbers have, once again, obtained primacy. Qualitative data is not considered to be evidence of change but, rather, an explanation of the change that the numbers provide evidence of (Kusek and Rist, 2004). However, this is by no means unique to development. For example, in the trialling of a new drug, patients’ reporting of adverse reactions to a newly trialled drug was not considered significant as it could not be detected quantitatively (van Grootheest and de Jong-van den Berg, 2004).

Another practical manifestation which transcends evidence and results is the ‘logical framework’ or logframe. The logframe is intended to allow a theoretical causal chain to be drawn between the activities that a programme undertakes, the expected outcomes of those activities (monitoring and results), and the impact that those outcomes have on poverty. In practice the use of the logframe has constrained adaptability and innovation in development programmes (Chambers, 1997; Gasper, 1997; Hummelbrunner, 2010).

Practitioners in enterprise development have attempted to accommodate the results agenda through the creation of the Donor Committee for Enterprise Development (DCED) Standard for Results Measurement. This is a tool for both donors and practitioners to provide formalised advice on best practice acceptable to both parties (Thomas, 2008).

DfID’s recently produced policy on evaluation states of RCTs:

We recognise the usefulness of this type of work and support an increase in rigorous impact evaluations more generally, but as one tool in the evaluation tool box, which must sit alongside evidence gathered through other evaluation methodologies (DfID, 2009; 14)

However, in the terms of reference for a recently commissioned impact evaluation of a market development programme it was stated that the evaluators will ‘use experimental and quasi-experimental methods, especially randomised control trials, wherever possible’ (Interview with evaluation programme manager).

Conversely, a DfID funded working paper (Stern et al., 2012) examined this very issue and called for a more nuanced and context specific approach to the classification, use of and demands for evidence. Still, similar stipulations as to the necessity of experimental and quasi-experimental methods persist for a wide range of donors.

The increased financial scrutiny that accompanies a broader public sector resource constrained environment impacts further on the desire for comparability across development programmes as characterised by the re-emergence of VfM objectives, which all DfID programmes are now urged to demonstrate (DfID, 2011; 2; Eyben, 2012).

A further point of note is that this drive towards accountability has not stopped at the donor level. The recently created Independent Commission for Aid Impact (ICAI) ‘[C]hampion the use of independent evidence, to help the UK spend aid on what works best’. While the ICAI too is held accountable, to the International Development Select Committee who in turn are accountable to their constituents, there is an interesting paradox in that the ICAI states on its website that it ‘accounts for its own performance’ while ‘championing independent evidence’ (ICAI, 2013). Numerical targets are also in evidence with the ICAI which seeks to publish 10-15 reports a year on its website.

Implications for Development Programming and Outcomes

The emergence of EBP and the results agenda have impacted on development programme design and implementation and, ultimately, outcomes for poor people, significantly. These changes in the development sector were implemented with the explicit objectives of transparency, accountability, and improved outcomes relative to investment. The consequences of how these laudable objectives were implemented, however, have been far less positive. If development is viewed as a linear chain of power relationships then recent transitions have shifted power gradually up the chain away from those ‘doing development work’.

The global organisation of aid and how it is distributed varies by country and across time. Vielajus et al. (2009) provide a good summary of how development agencies are organised and their accountability structures, in the UK, Germany, France, and Sweden. However, since this time, internal and external bodies such as the ICAI and USAIDs Partner Compliance and Oversight Division have been set up to provide additional levels of accountability.

📷📷Figure 1 represents a stylised version of accountability in the development sector in the UK. As the results agenda has proceeded, the weighting of risks undertaken by each party has been pushed downwards. DfID reports on the success of its work according to a narrowly defined set of indicators. Increasingly contractors’ payments are made contingent on ‘milestone payments’ and contractors make a proportion of their (often freelance) employees’ salaries contingent upon satisfaction of these criteria. The primary way to minimise this risk for all parties is to set modest, achievable targets that are easily measureable, static, and clearly defined. Consequently, EBP means programmes are designed where the implementer has maximum control over outputs.

However, contractors are not the primary losers in this new configuration of accountability. There are two notable absentees from Figure 1. Firstly, developing country governments, for which, despite the Paris Declaration, influence has decreased where they are seen as an uncontrollable variable if there is no explicit requirement for their inclusion. If a seven-year development programme is set the target of increasing access to road networks for rural communities, the contractor is unlikely to work through government to identify impediments to access and build capacities if their payment is contingent on success and will simply build the roads themselves. However, the most important set of absentees in Figure 1 are the intended beneficiaries. EBP and the results agenda do not frequently account for the genuine impacts development programmes have on the lives of poor people; not only the minority that are ‘reached’ by development programmes but the vast majority whose lives are not influenced by this direct transferral of goods and services.

Comparing Apples and Existentialism

Central to the growth of EBP has been an assumption that methods drawn from natural sciences can be applied with equal efficacy in development, despite the differences outlined above. Taking natural science methods for use in development is to misappropriate them; to believe that ‘apples can be compared with existentialism’. A direct comparison of medical science and development illustrates this point.

EBP, the results agenda, and development programming have become increasingly incompatible. In medicine, a professor will run a laboratory with multiple researchers. Research on ‘what works’ begins at the theoretical level and is then expanded to the cell level, conducted in live trials on mice before limited human trials can begin. Results are analysed including considerations of cost effectiveness before a drug will be classified as safe. Drugs can fail phase II or phase III trials if side effects are witnessed which impact on the continued use and uptake of the treatment. Leaving aside fundamental differences identified above - homogeneity/plurality of ‘problems’ (illnesses/poverty causes), ‘treatments’ (drugs/incremental multifaceted social, political, and economic interventions), and ‘treatment groups’ (humans/poor people, societies, communities, industries etc.) - there remains a critical difference in how these two fields are approached which impacts on the relevance of EBP.

In medicine, a good result may not be a cure for a disease but a positive indication of the areas in which treatments may be developed, with a great deal of further research, to address some forms of that disease. It is on this data which evidence-based medicine is founded. Conversely development programmes are evaluated and compared in relation to far higher level objectives, with far fewer examples on which to draw.

One of the more highly-specified systematic reviews in development examines the economic impact of conditional cash transfer (CCT) programmes (Kabeer et al., 2012). It bases conclusions on 11 programmes in nine countries. All bar one of the studies are of programmes in Latin America. Common problems exist of the wide range of differences in the programmes being compared in terms of their target variables, treatment groups, years of operation etc. The majority of conclusions are intuitive, such as the fact that CCTs are more effective in reducing child labour for older, male children as they are more likely to have been in work prior to the CCT. Where conclusions were not intuitive, they were found to be highly geographically differentiated:

...we find that the impact of transfers on adult labour supply varied by gender, size and duration of transfer and type of employment (Kabeer et al., 2012; 20-21)

This conclusion is made despite 10 of the 11 programmes being based in Latin America, giving a certain degree of cultural comparability. There are also areas where the conclusions of the review are far more dubious when used to influence policy. In the area of household consumption, a difference is suggested between how female and male recipients use the CCT, with transfers to females resulting in higher quality nutritional intake and increased purchases of children’s clothing. Ultimately, despite all of the caveats contained in the document the conclusion is drawn that:

CCTs appear to be an effective measure for achieving what they were designed to achieve: promoting children’s education and reducing child labour among poor and marginalised groups (Kabeer et al., 2012; 44)

It is in the utilisation of such conclusions that policy is influenced. The source for this particular decisive conclusion is a set of 15 studies across seven countries, six of which are in Latin America (This is a subset of the broader systematic review paper referring only to the effect of CCTs on child labour and education), with a range of impact variables and methodologies. This is the medical equivalent of stating that ‘chemotherapy cures cancer’ based on a research from a single lab’s animal testing.

Unlike in the medical scenario, in development, the ‘solutions’ being sought are the equivalent of a panacea for all diseases or, arguably, something with a greater number of context specific causal factors. The number of ‘observations’ being used to make these generalisations represent an incalculably small fraction of those used in medicine. The causes or extent of poverty are not binary, and nor are the solutions. Social problems are complex and varied and while some in development are recognising this complexity, reconciling this with the drive towards accountability and EBP has yet to be accomplished.

Systemic Change, EBP and the Results Agenda

Official Development Assistance (ODA) totalled $133.5bn in 2011 with approximately 2.5bn people living on less that $2 per day (OECD, 2012). So, at current levels direct aid could offer under $0.15 to each poor person per day. In the context of rising global food prices and increasing population pressure, this would have minimal impact on the majority of people’s lives. Therefore, effective aid has to stimulate wider change. There has been an increasing recognition in recent years that this is not likely to happen by through traditional direct delivery of ‘charity’ (Pronk, 2001; Rogerson, 2011). Instead donors have begun, discursively at least, to incorporate objectives of systemic change into some their programmes in an increasing range of sectors.

Systemic Change, Making Market Work for the Poor (M4P), global production networks and complexity science are all ways of understanding development processes as complex adaptive systems rather than linear input-output models. Since the early-2000s, this has been increasingly recognised amongst donors (see for example USAID’s Feed the Future, AUSAID’s PRISMA, and the multi-donor Katalyst programme). Systemic change was conceived of as a way to leverage aid inputs through emulation utilising local surpluses in the recipient country. Thorough analysis of the incentives of different actors within a system to alter their behaviour resulting in better outcomes for poor people allows development programmes to intervene only where this change is plausible. Consequently, post-intervention, incentives remain aligned with pro-poor outcomes making changes in a system sustainable. Scale is achieved through demonstration, imitation, and adaptation (Elliot et al., 2008). The potential for systemic change to deliver lasting pro-poor outcomes to large numbers of people has now been recognised in fields including health, education, and finance (Bloom et al., 2012; Peters et al., 2012; Ledgerwood et al., 2013)

Incompatibilities between EBP and Systemic Change

The drive for EBP and the results agenda and the drive for systemic change have occurred in parallel and the incompatibilities of these agendas have yet to be reconciled. This is perhaps reflective of a broader trend that donors have become more demanding as budgets and consequently scrutiny have increased. As political concerns gain public support, they are quickly added to the specifications of what donors want. The climate change agenda means every intervention must be climate neutral/positive (Klein et al., 2007) . Gender considerations mean programmes must disproportionately benefit women (Kabeer, 2003; Moser, 2005). The fragmented nature of research, communications, and policy divisions within donors results in a corporate cognitive dissonance meaning that this wish list grows ever longer with no conception of whether the concurrent satisfaction of these goals is possible or desirable.

There is a clear disconnect between NPM, which seeks to set predefined numerical targets for the duration of a programme and manage through their enforcement, and a systemic change approach which seeks to implement short feedback loops and bring about institutional and behavioural change through detailed and ongoing analysis of local contexts leading to iterative and bespoke solutions which adapt over time (Blackman, 2001).

A systems understanding of development represents a typical example of the complex unbounded problem (Dovers and Handmer, 2012) which makes assessment through the rigid framework of NPM and its tools highly challenging. One of the primary issues is that there is no shared agreement about the problem (Harrison, 2000).

These are problems associated with setting predefined targets in situations which require adaptability as the context becomes clearer and external circumstances change. Ultimately, such aspects of the results agenda address two of the three elements of its rationale. Contractors are held accountable for how well they execute the tasks which they have been set, and the results agenda performs this task well. However, it cannot address whether the contractors are being held accountable for the things most likely to achieve their ultimate goals. In monitoring, and indeed paying for, programmes in this way, implementers have no scope to deviate from what was agreed at the start of the programme, whatever the developmental implications of this might be. Secondly, the results agenda addresses issues of efficacy. It measures which of the predefined strategies resulted in the greatest change in the predefined indicators. In the monitoring of these programmes, it does not allow for the measurement of any additional indicators using alternative methodologies; capturing the externalities.

Perhaps the most significant problem with EBP comes in using the evidence gathered from these activities for the learning aspect of the rationale. Systemic change programmes are, above all, context specific. If a programme that had worked with a fertiliser producer in Nigeria to deliver training in usage at the point of sale was shown to have increased yields more than any other programme with the same goals, verified by an RCT, does not mean to say that this would be the most successful strategy in another part of Nigeria let alone in other developing countries. Social, economic, cultural, and political circumstances dictate outcomes and it is the iterative and flexible nature of systemic change programmes that allows them to succeed. Moreover, in an intervention such as this, working through partners is highly dependent on personalities and the nature of the companies or organisations that interventions are implemented through. The sustainability and scale objectives of systemic change programmes are reliant upon strong and capable partners that, by the end of the programme period, will be able to maintain and augment the change initiated by the development programme’s intervention in the absence of external assistance. The absence of such partners in an alternative context means that different ways of addressing the same problem may have to be employed.

In addition to alternative solutions, the very nature of the problem may also differ in alternative contexts. Low use of fertiliser in a different area may be due to low capital availability and a lack of micro-credit facilities as one of a wide range of other possible causal factors. Currently, the entire discourse characterises societies and the humans of which they are comprised as a homogenous mass with identical behaviours and responses. Where the incentives that underlie failures in systems are considered, they too are considered to be universal despite countless examples to the contrary.

What follows is a set of three examples that serve as case studies of the incompatibility of systemic change with how EBP and the results agenda are currently implemented:

Independent Contractors and Online Databases

In one programme with systemic change objectives, a constraint on the system in which independent building contractors were operating was that small scale producers were not linked with the buyers where there was nascent demand. As part of the need to identify targets during the design process, one intervention was identified whereby a database would be created to overcome this constraint. The theory was that buyers were unaware of the quality and volume of products and services available and so the database would lead to better linkages and increased incomes. A target was set to register 165 contractors on the database and was subsequently exceeded substantially by the implementer, scoring them highly in the annual review process.

Unfortunately, the review found that there had been no impact on poverty as no buyers used the database, internet connection and computers were absent and the construction workers did not see its purpose. The results agenda consequently produced a false positive. The intervention had no defined exit strategy and little prospect for the change to continue beyond the period of donor support. However, having previously identified a criterion on which the programme would be assessed, irrelevant to its broader strategic goals, the programme was bound to succeed by its own measures (DfID, 2012).

Where’s the Evidence?

At the level of evidence on which to base policy, one development consulting firm was set the task of conducting a significant multi-year evaluation of a major systemic change programme in Asia with a view to assessing ‘what works’. The conclusions of the evaluation were highly critical of its impact, despite fundamental errors in the way the evaluation was conducted.

Firstly, the evaluation was assessing the programme based on criteria which were inappropriate for systemic change programmes, counting direct beneficiaries over systemic impacts. Ideally, a systemic change programme would not be visible to those that benefit from it in the long term, working through partners to catalyse change.

Secondly, the programme was criticised for not fully addressing the issues of producers in specific sectors. However, the programme’s focus had moved on as it began to concentrate on different sectors with pro-poor growth potential and interlinked markets where systemic constraints were present.

Thirdly, the evaluation was conducted soon after interventions had ended, significantly underestimating impact. Systemic change programmes result in incremental improvement with a significant lag between the change implemented and their effects being realised and so the timing of the evaluation was never likely to capture this. There was, therefore, a misunderstanding in the way the evaluation was conducted in relation to the programme’s objectives. This questions the parameters of ‘independent’ evaluations, particularly in the context of fluid, complex, and dynamic programmes (Interview with key informant involved with the programme).

What’s the Problem?

The contested subject of malarial bednets is one where the incompatibilities of EBP and systemic change are clear. Whilst this paper will not engage with the case study in detail as it is documented extensively elsewhere, it is useful to note the dynamics of the debate and its relevance to this paper. A number of studies have used RCTs to examine these effects, finding the assumptions made about paid distribution models (recipients valuing the nets more highly, reduction in resale etc.) grossly overstated, if not entirely false (Dupas, 2009; Hoffmann et al., 2009; Karlan et al., 2009a; Karlan et al., 2009b; Banerjee et al., 2010; Cohen and Dupas, 2010; Dupas, 2010; Bates et al., 2012). Conversely, other research (Abdulla et al., 2001; Lengeler et al., 2007; Heierli and Lengeler, 2008) examines the effects of a wide range of intervention models across a number of countries finding pluralistic results in the models of ‘what works’ and what does not. In Tanzania, for example, a private sector model has proven highly successful and sustainable for different at-risk groups and, additionally, has provided increased incomes for those involved in the manufacture of the nets.

The requirement to distil programmes to binary quantitative assessments means that, in the context of a short-term programme with short-term goals, free distribution of nets results in a greater reduction in rates of malaria than a programme which aims to build local capacity to cater to this demand and funding is allocated accordingly. This evidently has consequences for the sustainability and scale of change that can be affected. This is a clear example of the point referred to earlier in this section where there is no shared agreement of the problem (Harrison, 2000). If the problem is defined as being the incidence of malaria in a given area within a given timeframe then there are definitive ways to evaluate approaches to that. If the problem is defined as the inability of a local economy and health system to cater to its health needs in the long term – including the long-term incidence of malaria - a different approach is required together with different evaluation techniques.

EBP is, then, striving for the wrong goal; results are serving as a disciplinary tool which compromises the capacity for innovation amongst programme designers and implementers, while evidence is striving for a solution to development challenges. Instead, results could be used in tandem with evidence to assess the efficacy of different approaches to determining and tackling context specific problems, in line with broader development objectives.

Positive Signs and a Way Ahead

To this point the paper has documented the rise of EBP and the results agenda and the impact of this on development programming. Subsequently the paper identified a welcome move towards longer-term thinking and systemic change objectives in development programming but highlighted the incongruity of these two trends. Ultimately, this demonstrates the cognitive dissonance within the minds of policy makers.

This could lead to a nihilistic outlook on measurement within development programmes in general. However, when compared with the alternative of wasted money, ineffective programmes and poor results it is clear that an alternative must be sought. Difficulty in measurement should motivate alternative approaches to measurement rather abandoning an intervention. Ways of addressing this problem include a reappraisal of qualitative methods, triangulation of quantitative and qualitative methodologies and the incorporation of emerging techniques which aim to synthesise the two. While it is beyond the scope of the review to engage in thorough methodological discussions, participatory statistics and a range of other ways of quantifying qualitative data offer new and robust measurement techniques which take greater account of context.

Furthermore, the limitations, and unintended impacts, of the results agenda are now beginning to be recognised by some donors. USAID have, in recent years, begun to advocate the use of the ‘Degrees of Evidence Framework’, in agricultural value chains and finance programmes. Based on programme experience, evaluators began to recognise the need for flexibility in the application of methodologies to different social, economic and cultural contexts. The framework provides those working on programmes with a means of appraising the best practical way to obtain robust data within the context and resource constraints of any given situation. This represents progress towards a better understanding of how to evaluate change. The advocacy of triangulation and pragmatism in the assessment of methods too is positive. However, the hierarchical assessment of methods within the framework is not appropriate to the complex unbounded problems of the majority of systemic change programmes and its emphasis on targets means that it still fails to capture many of the externalities which are in fact one of the primary objectives of systemic change programmes (Creevey et al., 2010).

The DCED standard has shown willingness amongst practitioners to measure the impact that their programmes have. However, the standard’s emphasis on the need for measurement to be an internal function runs contrary to the drive amongst many donors for independent evidence and, increasingly, the minimum is not good enough. Furthermore, the standard is indicative rather than instructive and, as a private sector development-focused initiative, there is very little indication of how to fully capture change at a system wide level.

More widely, evaluations are, in some cases, encouraged as iterative, collaborative, and adaptive processes developed between evaluators and implementing parties. In a smaller number of cases, innovative approaches to the measurement of complex development programmes are being introduced and, more importantly, accepted by donors. This could lead to the development of a more pragmatic and less positivist best-practice standard for results measurement in systemic change programmes, which can begin to contribute to the ‘evidence’ on which policy is increasingly based. However, such evaluation designs remain rare and rely on courageous individuals within donors to advocate for their employment in the face of normative voices within their organisation.

In order to reconcile the clear incompatibilities between how EBP and the results agenda are currently realised in development with the parallel trend towards a systemic approach, there is a clear need for further research, conceptual and methodological development of how to capture change without compromising programme objectives. The majority of development programmes are open to measurement being used as tool in the improvement of development outcomes. However, this should not occur at the expense of innovation and pursuing optimal developmental outcomes. Evidence needs to be redefined as a means of assessing approaches to development so that lessons can be learned and adapted to different contexts rather than a means of assessing which tool should be applied universally.


ABDULLA, S., KIKUMBIH, N., MASSANJA, H., MSHINDA, H., NATHTAN, R., SAVIGNY, D., ARMSTRONG-SCHELLENBERG, J. & VICTORIA, C. 2001. Mosquito nets, poverty and equity in rural southern Tanzania. Ifakara Health Research and Development Centre, Tanzania.

ALTENBURG, T. The private sector and development agencies: How to form successful alliances. Critical issues and lessons learned from leading donor programs. 10th International Business Forum, 2005.

BANERJEE, A. V., DUFLO, E., GLENNERSTER, R. & KOTHARI, D. 2010. Improving immunisation coverage in rural India: clustered randomised controlled evaluation of immunisation campaigns with and without incentives. BMJ: British Medical Journal, 340.

BARZELAY, M. 2001. The new public management: Improving research and policy dialogue, Univ of California Press.

BATES, M. A., GLENNERSTER, R., GUMEDE, K. & DUFLO, E. 2012. The Price is Wrong. Field Actions Science Reports. The journal of field actions.

BERNARD, C. 1957. An introduction to the study of experimental medicine, Courier Dover Publications.

BLACK, N. 2001. Evidence based policy: proceed with care. BMJ: British Medical Journal, 323, 275.

BLACKMAN, T. 2001. Complexity theory and the new public management. Social issues, 1.

BLOOM, G., KANJILAL, B., LUCAS, H. & PETERS, D. 2012. Transforming health markets in Asia and Africa, Routledge.

BOVAIRD, T. & DAVIES, R. 2011. Outcome-Based Service Commissioning and Delivery: Does it make a Difference? Research in Public Policy Analysis and Management, 21, 93-114.

CHAMBERS, R. 1997. Whose Reality Counts?: Putting the first last, Intermediate Technology Publications Ltd (ITP).

CLARK, G. 2008. A farewell to alms: a brief economic history of the world, Princeton University Press.

COHEN, J. & DUPAS, P. 2010. Free Distribution or Cost-Sharing? Evidence from a Randomized Malaria Prevention Experiment*. Quarterly Journal of Economics, 125, 1.

CREEVEY, L., DOWNING, J., DUNN, E., NORTHRIP, Z., SNODGRASS, D. & COGAN WARES, A. 2010. Time to Learn: An Evaluation Strategy for Revitalised Foreign Assistance. In: USAID (ed.).

DANG, H.-A., KNACK, S. & ROGERS, H. 2009. International aid and financial crises in donor countries, World Bank.

DE RENZIO, P. 2006. Aid, budgets and accountability: A survey article. Development Policy Review, 24, 627-645.

DFID 2009. Building the Evidence to Reduce Poverty: The UK's Policy on Evaluation for International Development. In: DEVELOPMENT, D. F. I. (ed.). London.

DFID 2011. How to Note: Reviewing and Scoring Projects. In: DEVELOPMENT, D. F. I. (ed.). London.

DFID 2012. Support to the Construction and Real Estate Sector Growth and Employment in States (GEMS) Programme - GEMS 2 Annual Review. In: DEPARTMENT FOR INTERNATIONAL DEVELOPMENT (ed.). London.

DOMBERGER, S. & RIMMER, S. 1994. Competitive tendering and contracting in the public sector: A survey. International Journal of the Economics of Business, 1, 439-453.

DOVERS, S. & HANDMER, J. 2012. The handbook of disaster and emergency policies and institutions, Routledge.

DROOP, J., ISENMAN, P. & MLALAZI, B. 2008. Paris Declaration on Aid Effectiveness: Study of Existing Mechanisms to Promote Mutual Accountability (MA) between Donors and Partner Countries at the International Level: A Study Report. Oxford: Oxford Policy Management.

DUPAS, P. 2009. What matters (and what does not) in households' decision to invest in malaria prevention? The American Economic Review, 224-230.

DUPAS, P. 2010. Short-run subsidies and long-run adoption of new health products: Evidence from a field experiment. National Bureau of Economic Research.

EASTERLY, W. 2002. The cartel of good intentions: the problem of bureaucracy in foreign aid. The Journal of Policy Reform, 5, 223-250.

EASTERLY, W. 2006. The white man's burden: why the West's efforts to aid the rest have done so much ill and so little good, Penguin.

EASTERLY, W. & PFUTZE, T. 2008. Where does the money go? Best and worst practices in foreign aid. Journal of Economic Perspectives, 22.

ELLIOT, D., GIBSON, A. & HITCHINS, R. 2008. Making markets work for the poor: rationale and practice. Enterprise Development and Microfinance, 19, 101-119.

ELTON, L. 2004. Goodhart's Law and performance indicators in higher education. Evaluation & Research in Education, 18, 120-128.

ESPELAND, W. N. 1997. Authority by the numbers: Porter on quantification, discretion, and the legitimation of expertise. Law & Social Inquiry, 22, 1107-1133.

EYBEN, R. 2012. Relationships for aid, Routledge.

FEINSTEIN, O. N. & PICCIOTTO, R. 2000. Evaluation and Poverty Reduction: Selected Proceedings from a World Bank Seminar, World Bank Publications.

FERLIE, E. 1996. The new public management in action, Oxford University Press, USA.

GASPER, D. 1997. Logical frameworks': a critical assessment: managerial theory, pluralistic practice. ISS Working Paper Series/General Series, 264, 1-46.

GRAY, A. & JENKINS, B. 1995. From public administration to public management: reassessing a revolution? Public administration, 73, 75-99.

HALL, A., RASHEED SULAIMAN, V., CLARK, N. & YOGANAND, B. 2003. From measuring impact to learning institutional lessons: an innovation systems perspective on improving the management of international agricultural research. Agricultural Systems, 78, 213-241.

HAQUE, M. S. 2007. Revisiting the new public management. Public Administration Review, 67, 179-182.

HARLAND, C., KNIGHT, L., LAMMING, R. & WALKER, H. 2005. Outsourcing: assessing the risks and benefits for organisations, sectors and nations. International Journal of Operations & Production Management, 25, 831-850.

HARRISON, T. 2000. Urban policy: addressing wicked problems. What works? Evidence Based Policy and Practice in Public Services. London: Policy Press.

HEIERLI, U. & LENGELER, C. 2008. Should Bednets be Sold, or Given Free? Swiss Agency for Development and Cooperation, Berne, Switzerland.

HOFFMANN, V., BARRETT, C. B. & JUST, D. R. 2009. Do free goods stick to poor households? Experimental evidence on insecticide treated bednets. World Development, 37, 607-617.

HOOD, C. 1991. A public management for all seasons? Public administration, 69, 3-19.

HOOD, C. & PETERS, G. 2004. The middle aging of new public management: into the age of paradox? Journal of public administration research and theory, 14, 267-282.

HOPWOOD, A. G. & MILLER, P. 1994. Accounting as social and institutional practice, Cambridge, Cambridge University Press.

HUMMELBRUNNER, R. 2010. Beyond Logframe: Critique, variations and alternatives. Beyond Logframe; Using Systems Concepts in Evaluation, 1.

ICAI 2012. DfID Report and Annual Accounts.

ICAI. 2013. Role and Core Values [Online]. Available: [Accessed 01/07/13 2013].

JOHNSEN, Å. 2005. What does 25 years of experience tell us about the state of performance measurement in public policy and management? Public Money and Management, 25, 9-17.

KABEER, N. 2003. Gender mainstreaming in poverty eradication and the millennium development goals: A handbook for policy makers and other stakeholders, London: Commonwealth Secretariat: Ottawa: International Development research Centre.

KABEER, N., PIZA, C. & TAYLOR, L. 2012. What are the economic impacts of conditional cash transfer programmes? A Systematic Review of Evidence. In: EPPI CENTRE, SOCIAL SCIENCE RESEARCH UNIT, INSTITUTE OF EDUCATION & UNIVERSITY OF LONDON (eds.).

KARLAN, D., GOLDBERG, N. & COPESTAKE, J. 2009a. 'Randomized control trials are the best way to measure impact of microfinance programmes and improve microfinance product designs.'. Enterprise Development and Microfinance, 20, 167-176.

KARLAN, D., GOLDBERG, N. & COPESTAKE, J. 2009b. Randomized control trials are the best way to measure impact of microfinance programs and improve microfinance product designs. Enterprise Development and Microfinance, 20, 167-176.

KLEIN, R. J., ERIKSEN, S. E., NÆSS, L. O., HAMMILL, A., TANNER, T. M., ROBLEDO, C. & O’BRIEN, K. L. 2007. Portfolio screening to support the mainstreaming of adaptation to climate change into development assistance. Climatic change, 84, 23-44.

KUSEK, J. Z. & RIST, R. C. 2004. Ten steps to a results-based monitoring and evaluation system: a handbook for development practitioners, World Bank Publications.

LAPSLEY, I. 2009. New Public Management: The Cruellest Invention of the Human Spirit? 1. Abacus, 45, 1-21.

LEDGERWOOD, J., EARNE, J. & NELSON, C. 2013. The New Microfinance Handbook: A Financial Market System Perspective, World Bank Publications.

LENGELER, C., GRABOWSKY, M. & MCGUIRE, D. 2007. Quick wins versus sustainability: options for the upscaling of insecticide-treated nets. The American journal of tropical medicine and hygiene, 77, 222-226.

MADAUS, G. F., RYAN, J. P., KELLAGHAN, T. & AIRASIAN, P. W. 1987. Payment by Results: An Analysis of a Nineteenth Century Performance-Contracting Programme. The Irish Journal of Education/Iris Eireannach an Oideachais, 80-91.

MEHROTRA, S. 2009. International development targets and official development assistance. Catalysing Development: A Debate on Aid, 161.

MINOGUE, M., POLIDANO, C. & HULME, D. 1998. Beyond the new public management: changing ideas and practices in governance, Cheltenam, Edward Elgar Pub.

MOSER, C. 2005. Has gender mainstreaming failed? A comment on international development agency experiences in the South. International Feminist Journal of Politics, 7, 576-590.

NUTLEY, S. M., DAVIES, H. T. & SMITH, P. C. 2000. What works? Evidence-based policy and practice in public services, London, Policy Press.

OECD. 2012. Development Aid Flows [Online]. Available: [Accessed 01 July 2013].

OECD 2013. Query Wizard for International Development Statistics. Organisation for Economic Cooperation and Development,.

PARSONS, W. 2002. From muddling through to muddling up-evidence based policy making and the modernisation of British Government. Public policy and administration, 17, 43-60.

PAWSON, R. 2004a. Evidence Based Policy. Making realism work: Realist social theory and empirical research, 24.

PAWSON, R. 2004b. Evidence Based Policy. In: CARTER, B. & NEW, C. (eds.) Making realism work: Realist social theory and empirical research. London: Routledge.

PETERS, D. H., PAINA, L. & BENNETT, S. 2012. Expecting the unexpected: applying the Develop-Distort Dilemma to maximize positive market impacts in health. Health Policy and Planning, 27, iv44-iv53.

POWER, M. 1997. The audit society: Rituals of verification, Oxford, Oxford University Press.

PRONK, J. P. 2001. Aid as a Catalyst. Development and change, 32, 611-629.

RAPPLE, B. 1994. Payment by results: An example of assessment in elementary education from nineteenth century Britain. Library Publications, 5.

ROGERSON, A. 2005. Aid harmonisation and alignment: Bridging the gaps between reality and the Paris reform agenda. Development Policy Review, 23, 531-552.

ROGERSON, A. 2011. What if development aid were truly “catalytic”? ODI Background Note.

SACKETT, D. L., ROSENBERG, W. M., GRAY, J., HAYNES, R. B. & RICHARDSON, W. S. 1996. Evidence based medicine: what it is and what it isn't. BMJ: British Medical Journal, 312, 71.

SANDERSON, I. 2003. Is it ‘what works’ that matters? Evaluation and evidence‐based policy‐making. Research papers in education, 18, 331-345.

SOLESBURY, W. 2001. Evidence based policy: Whence it came and where it's going. ESRC UK Centre for Evidence Based Policy and Practice London.

STERN, E., STAME, N., MAYNE, J., FORSS, K., DAVIES, R. & BEFANI, B. 2012. Broadening the Range of Designs and Methods for Impact Evaluations: Report of a Study Commissioned by the Department for International Development. DFID. http://www. dfid. gov. uk/r4d/pdf/outputs/misc_infocomm/DFIDWorkingPaper38. pdf.

STRATHERN, M. 2000. Audit cultures: anthropological studies in accountability, ethics, and the academy, Hove, Psychology Press.

STUBBS, P. 2003. International non-state actors and social development policy. Global Social Policy, 3, 319-348.

TE VELDE, D. W., AHMED, M. M., ALEMU, G., BATEGEKA, L., CALÍ, M., CASTEL-BRANCO, C., CHANSA, F., DASGUPTA, S., FORESTI, M. & HANGI, M. 2008. The global financial crisis and developing countries. ODI Background Note. London: Overseas Development Institute.

THOMAS, S. 2008. Results Measurement and Programme-based Approaches (PBA). Methods and Instruments for the Evaluation and Monitoring of VET-systems, 55-61.

VAN GROOTHEEST, K. & DE JONG-VAN DEN BERG, L. 2004. Patients' role in reporting adverse drug reactions. Expert opinion on drug safety, 3, 363-368.

VIELAJUS, M., HUDSON, A., JONSSON, L. & NEU, D. 2009. The challenge of accountability for development agencies within their own countries and before their peers. In: AFD (ed.).


*this article is reposted from the Journal of Enterprise Development and Microfinance. Please cite as Taylor, B., 2014, In vogue and at odds: systemic change and new public management in development, Journal of Enterprise Development and Microfinance 25(4).


bottom of page