Showing posts with label RCTs. Show all posts
Showing posts with label RCTs. Show all posts

17 April 2025

Coaching is better than training, but there is still a questionmark on scalability

"So should governments switch to frequent coaching sessions? Possibly, but the next step should first be to try this type of intervention at scale. 
Finding three highly skilled coaches is one thing, but you might need hundreds or thousands of them if you were to run a similar programme across an entire country. 
One potential route to scale is through new uses of technology. A study in Brazil found positive impacts of a virtual-coaching programme run via Skype, for example. 
But perhaps the most straightforward type of technology to go for is scripts, which this paper suggests have positive effects on learning both when presented through centralised training and intensive coaching."

10 January 2025

Experimental Conversations

When I was studying for my undergraduate degree, probably the most enjoyable book I read I think I happened to stumble across in the library (back in the day when you had to actually go to the library to find book chapters and physical copies of journals to read), called ‘Conversations with Leading Economists’. The conversational style, discussing in conversational language how ideas came about and how theorists interacted with each others' ideas and with data, was an amazing breath of fresh air, and a world away from the weirdness of the textbooks which often appear to pass down strange and seemingly grossly unrealistic theories and models of the world as if they were some kind of natural law. The list of interviewees includes Milton Friedman, Robert Lucas, Gregory Mankiw, Franco Modigliani, Paul Romer, Robert Solow.

That conversational style can probably be slightly more commonly found these days in the post-blogging social media world, but there are still plenty of important thinkers who don’t very frequently blog or write op-eds (they’re busy being important thinkers), so Timothy Ogden has provided the wonderful service of writing up a series of interviews with some of the leading voices in both academia and policy on the use of randomized evaluations and field experiments in development economics.


You can buy the book 'Experimental Conversations’ here.

Read the chapter with Angus Deaton for free here

And subscribe to Tim’s weekly newsletter here.

26 February 2025

Tom Kane on Education RCTs

"If our goal is to change behaviour and drive policies towards more effective solutions, what we have done so far is a complete failure. People who are running the What Works Clearing House don’t even have a theory [of how evidence would affect policy], or to the extent that they have a theory, its been proven wrong. … We’re just deluding ourselves if we think the 5 year, $15 million studies are having any impact whatsoever."

That’s Tom Kane (somewhat echoing Lant) on the Education Next podcast. His preferred alternative to the RCT+systematic review approach though has nothing to do with crawling on any design spaces. Rather it’s doing much more quick turn-around quasi-experimental research using the multitudes of outcomes data now being collected in the US for teacher and school accountability purposes. All that’s apparently really missing is data on the actual inputs - there is amazingly rich longitudinal data on student test scores, but no record which could be matched of what textbooks are being used in different schools, or what training courses different teachers are going on. Sounds pretty sensible to me.

04 August 2025

Effective Altruism, RCTs, NGOs, & the Government End-Game

Good Ventures just gave a $25 million unrestricted grant to Give Directly on the advice of Givewell. That’s a lot of good news in one sentence, but it’s not even the best part. Givewell buried the lede when they mention around paragraph 20 that;

"GiveDirectly plans to discuss partnerships with the following types of institutions:

- Donor aid agencies
- Developing country governments (national and local). (For example, several governors in Kenya have already approached GiveDirectly about running cash transfer programs in their counties.)"

That’s what it’s all about. To really get sustainability and scale in social policy you need government involvement - that’s why the best NGOs combine a mixture of immediate direct service delivery in places where government just doesn’t have the capacity to deliver, with support to interested governments to build that capacity for the longer-term, often at the local level where administrators struggle to actually implement well-designed central policy documents, and with innovation in new models of service delivery, that governments might later adopt, of which GiveDirectly is clearly a strong example. Similarly whilst Innovations for Poverty Action and J-PAL may have started off following that recently infamous Kremer-Miguel deworming study by working on service delivery through small NGOs, their focus is on things that can work at scale, and having built a reputation through working with NGOs have been able to transition to working with governments (for example in Ghana and Peru).

As Jessica Brass writes,

"Government and NGOs learn from each other to improve what they do. In particular, many government agencies notice the successes achieved by NGOs and, whether intentionally or not, mimic their actions"

So yes, maybe some of the effective altruists can be accused of being philosophers not development wonks, and potentially even naive about politics, but for every anecdote-backed theoretical case for how aid might undermine the process of building citizen-state accountability, I can come up with an anecdote-backed theoretical case for how aid can support improved governance through innovation in service delivery models, and until we get some quantitative evidence on the issue, I don’t see how else we’re going to resolve the debate.

Did I miss anything?

30 June 2025

New evidence on (lack of) external validity

"Site selection bias" can occur when the probability that a program is adopted or evaluated is correlated with its impacts. I test for site selection bias in the context of the Opower energy conservation programs, using 111 randomized control trials involving 8.6 million households across the United States. Predictions based on rich microdata from the first 10 replications substantially overstate efficacy in the next 101 sites. Several mechanisms caused this positive selection. For example, utilities in more environmentalist areas are more likely to adopt the program, and their customers are more responsive to the treatment. Also, because utilities initially target treatment at higher-usage consumer subpopulations, efficacy drops as the program is later expanded. The results illustrate how program evaluations can still give systematically biased out-of-sample predictions, even after many replications.

H. Allcott in the QJE

21 July 2025

Evidence-based policy-making US-style

Based on our rough calculations, less than $1 out of every $100 of government spending is backed by even the most basic evidence that the money is being spent wisely.
...
Since 1990, the federal government has put 11 large social programs, collectively costing taxpayers more than $10 billion a year, through randomized controlled trials, the gold standard of evaluation. Ten out of the 11—including Upward Bound and Job Corps—showed “weak or no positive effects”
Just in case you thought that there was any danger of the whole results agenda and RCT-fetishism taking over in American politics. From an excellent piece in last month's The Atlantic, which by the way is generally fantastic, I just bought a paper copy for the first time and the whole thing was full of interesting - a critical look at the evidence on over 35 female fertility, a thing about how much health food is actually really unhealthy, a note about how recycling can actually increase carbon emissions because it needs more trucks on the streets, and a piece discussing relationships and gender politics and family from the perspective of a man who has sacrificed his career for his wife's. 

21 January 2025

When rigorous impact evaluation *does* make quite a big difference

If you care at all about unemployment and labour market policy, or really about much of social policy, this new paper from Esther Duflo and co-authors should have you quite worried.

The policy - pay a private provider for each unemployed person that they get into a job.

The result (part 1) - the policy was successful at getting unemployed participants into jobs.

The result (part 2) - almost all of these jobs were just taken from other people who would otherwise have got them. Pure displacement. No net change in unemployment.

Most impact evaluations don't measure such "spillover" effects or "externalities", because they are really hard to measure (neither do most non-randomised evaluations.., this is not a criticism of RCTs).
Ignoring externalities, we would have thus concluded, for example, that 100,000 euros invested in the program would lead 9.7 extra people to find a job within eight months. Since the eff ect disappears by 12 months, this already appears to be quite expensive, at about 10,000 euros for a job found on average four months earlier. But at least, it is not counterproductive. With externalities, investing 100,000 euros leads to no improvement at all.
Bruno Crepon, Esther Duflo, Marc Gurgand, Roland Rathelot, and Philippe Zamoray (2012), Do labor market policies have displacement effects? Evidence from a clustered randomized experiment

30 August 2024

Do Urban Livelihoods Programmes Work?

Apparently not in Sri Lanka.
The authors conduct a randomized experiment among women in urban Sri Lanka to measure the impact of the most commonly used business training course in developing countries, the Start-and-Improve Your Business program. They work with two representative groups of women: a random sample of women operating subsistence enterprises and a random sample of women who are out of the labor force but interested in starting a business. They track the impacts of two treatments -- training only and training plus a cash grant -- over two years with four follow-up surveys and find that the short and medium-term impacts differ. For women already in business, training alone leads to some changes in business practices but has no impact on business profits, sales or capital stock. In contrast, the combination of training and a grant leads to large and significant improvements in business profitability in the first eight months, but this impact dissipates in the second year. For women interested in starting enterprises, business training speeds up entry but leads to no increase in net business ownership by the final survey round.
Suresh de Mel, David  McKenzie, and Christopher Woodruff , "Business training and female enterprise start-up, growth, and dynamics: experimental evidence from Sri Lanka" (HT: @timothyogden)

20 July 2025

Does deworming really work?

The latest Cochrane Collaboration review of the evidence on the impact of deworming on various outcomes has come out decidedly less than optimistic.

Here's a summary by the very smart Alexander Berger from Givewell, some discussion on the Public Library of Science blog including comment from one of the Cochrane authors and Alan Fenwick from SCI, and finally a rebuttal to the review's findings on schooling from IPA, JPAL, CEGA, Deworm the World, and the authors of the original Busia experiment on deworming.

I haven't spent enough time looking at the details to come to a strong opinion here, but one point made on the IPA blog seems evidently correct - random assignment should be enough to ensure pre-treatment balance between treatment and control. That is the whole point of random assignment. And  following the recent debacle of the medical journal the Lancet being forced to retract the key finding of a social-science-y study after some actual social scientists pointed out a mathematical error, combined with my disciplinary and professional loyalties, I'm inclined to go with the social scientists rather than the doctors on this one.


Update: Thoughts from David McKenzie

Doing governance is hard #163826353

First the good news: a new evaluation report from a community driven reconstruction programme in Eastern Congo (HT: Sarah Baileyshows yet again that it is possible to evaluate messy hard-to-measure governance interventions using rigorous quantitative methods. IPA and JPAL have an evaluation of a similar programme in Sierra Leone.

Now the bad news: this kind of design only works with interventions at the local level because you need a large sample size of units - in this case villages. National-level interventions give you a sample size of one, not very conducive for quantitative analysis.

And the worse news: these local level governance interventions don't seem to work. Both this Congo study and the Sierra Leone study find no improvement in local governance.

Now for some better news: we actually already know what a lot of the national-level governance interventions that need to be done are. They are boring. Things like audits of government accounts. South Sudan has finally just published the audit of the 2007 accounts, to apparent astonishment and outrage by parliamentarians. It's pretty grim reading. Though I'm not sure how anyone is actually honestly surprised. Still, it's probably not totally outlandish to think that audits done a bit quicker than 5 years after the fact might improve budget governance.

And now for the worst news of all: much of this easy, boring, national-level governance stuff is around accountability - which means the national leadership intentionally putting in place limits on its own power. Binding its own hands. You have to be an incredibly enlightened leader to purposely reduce your own power. The whole point of the politics game is increasing your own power. Which means that you need people to demand accountability and force leaders into action. And despite all the talk about governance from the international community, we aren't really interested or able to be the ones doing the demanding.

22 June 2025

Updates from IPA

A couple of exciting research results in the Innovations for Poverty Action (IPA) 2011 Annual Report;
Our work on youth employment in northern Uganda produced promising results in 2011. An evaluation of a government cash transfer initiative called the Youth Opportunities Initiative Program showed that transfer recipients on average were nearly twice as likely a year after receiving the grant to be employed in a skill based profession. Recipients earned on average almost 50 percent more than their peers, accruing a 35 percent rate of return on the transfer.
and results from one of the 9 ongoing evaluations of the BRAC graduation model:
Early results from West Bengal, India show that this type of ultra-poor program leads to a 15-25 percent increase in household consumption in the first year after the program’s completion.

15 June 2025

A quantitative history of RCTs


From a new report by the Behavioural Insights Unit at the British government (with Ben Goldacre and David Torgerson) on using RCTs in policy. 
Randomised controlled trials (RCTs) are the best way of determining whether a policy is working. They are now used extensively in international development, medicine, and business to identify which policy, drug or sales method is most effective. They are also at the heart of the Behavioural Insights Teamʼs methodology. However, RCTs are not routinely used to test the effectiveness of public policy interventions in the UK. We think that they should be.
(HT: Tim Harford's twitter feed)

13 April 2025

Scaling up Proven Interventions

If you are in the business of piloting development policies with NGOs, this chart should be keeping you awake at night. If you like to think about "sustainability" and "scale", about handing over your activities to the government, you need to be really really worried. 

Researchers persuaded World Vision and local government in Kenya to both implement that exact same intervention at the same time. The program as implemented by World Vision found a large impact on test scores. The exact same program, as implemented by the local government, found zero impact


For more see Gabriel Demombynes.

His conclusion:
Evaluation skeptics may try to cite this as evidence that RCTs are a waste of time, since it suggests that successful interventions implemented by NGOs, as they often are in experiments, may not be replicated at scale by governments. Others might take the paper to indicate that NGOs should be the preferred vehicle for interventions. I think these readings would be mistaken, and I take two reflections from the paper. First, we should do many more rigorous studies working with governments where we vary forms of service delivery to better understand what can work in practice. Second, the World Bank’s approach to public services—the long, difficult slog of working to improve government systems—is the right one, because it’s the only way to ultimately make services work for the poor at large scale.
 I agree. Clearly working through government systems is essential. Innovations for Poverty Action are doing just this - scaling up the exact same contract teacher program tested with NGOs in Kenya and India, but doing it with the government in Ghana, and doing an RCT as they go.

Addendum: Here is the link to the full paper: http://www.cgdev.org/doc/kenya_rct_webdraft.pdf

12 April 2025

The Impact of IPA

Measuring policy influence is hard, but this looks like a slam dunk for IPA: Concern Worldwide are launching a new Gates Foundation funded initiative focused on Innovation, Evidence, Action, and What Works.

Compare and contrast the Concern website;


and the IPA website;


Congrats all round. 

03 April 2025

NGOs, RCTs, and Institutions

There is a malicious and perverse relationship between the force of NGOs and the weakness of the Haitian state.
That is Ricardo Seitenfus, the Brazilian head of the Organization of American States mission to Haiti during the 2010 earthquake - quoted by Acemoglu and Robinson.

This is the real challenge to aid project impact evaluations, whether randomised or not, and is really the key argument made by the critics from Bauer to Easterly to Moyo. 

What isn't clear to me is how we should expect the impact of NGO activity on state strength to vary by the effectiveness of the project. Does the measured impact of the project on individuals have any relationship with the impact of the project on broader governance? It's at least plausible that a very effective health intervention (measured by outcomes for individuals) could also be effective in building state capacity for service delivery. Or on the contrary, the same measured improvement in outcomes for individuals could be consistent with undermining state capacity, by encouraging the state to spend less on health and more on other goodies, and also by poaching the best staff from the government into the NGO sector. Tricky.

Answers / more data sources on a postcard. 

30 March 2025

Yawn.... more RCT debates

Two very smart folks, Mark Rozenzweig and Martin Ravallion have reviews of Poor Economics in the latest Journal of Economic Literature (thanks to Abhi and Andrea for the papers). Obviously self-recommending when smart economists review smart economists. But there does seem to be a bit of a rehashing.

Martin's biggest score is the "where the hell is China?" line. Some of the other criticisms are a bit weaker.
Another likely bias in the learning process is that J-PAL’s researchers have evidently worked far more with nongovernmental organizations (NGOs) than governments.
Which is a bit of a cheap shot, and a bit innacurrate. Researchers have worked with whoever will let them experiment, which yes initially was NGOs but is increasingly governments - see Peru's Quipu commission, Chile's Compass commission, the teaching assistant initiative in Ghana, working with the planning Ministry in South Africa, experimenting with police service reform in Rajasthan, even Britain's Behavioural Insights Unit.

Then
how confident can we really be that poor people all over the world will radically change their health-seeking behaviors with a modest subsidy, based on an experiment in one town in Rajasthan, which establishes that lower prices for vaccination result in higher demand?
Ummmm... well thats why J-PALs policy recommendation for health pricing is based on 6 different studies....


Mark scores his biggest hit in the final footnote on the last page of his article;
Also absent is a discussion of the standard but major problem in the implementation of any programs or transfers targeted to the poor and that do not really spur development—moral hazard.
"Moral hazard" works at both the individual and national government level. If you get aid, you are probably less likely to work hard. The critical question is the magnitude of this effect. I think that on balance the positive value of effective aid outweighs the moral hazard, but that is more of a feeling than an evidence-based proposition. This is also one of the key points made by aid critics Bauer/Easterly/Moyo. Not necessarily that aid doesn't work, as Banerjee/Duflo would like to present their argument, but that even if aid does work, the negative moral hazard effect might outweigh the positive. I haven't seen this argument really addressed at all.

The other serious and neglected criticism for me is on general equilibrium, raised by Daron Acemoglu in the Journal of Economic Perspectives. What if you measure a positive impact of a program on earnings, but those are coming at the expense of others? A training program that increases earnings might just be equipping some individuals to out-compete others in the market, rather than necessarily increasing aggregate productivity, in which case scaling the program ain't gonna work.

So maybe I've missed them - but has anyone seen a convincing rebuttal to the moral hazard and general equilibrium critiques of micro aid project impact evaluation?

-----

Update: A couple of things I missed in my haste - Abhi points out that Rosenzweig makes good points on the sometimes tiny effect sizes lauded in Poor Economics (e.g. where "15% increase" translates to something like 2 weeks schooling or 50 cents), and that RCTs can focus our attention away from the big (important?) questions, but I felt this criticism is pretty well rehearsed.

Update 2: Also Ravallion loses points for his cliched title: "Fighting Poverty One Experiment at a Time". "x one y at a time" is a boring, tired, tired, catchphrase.

Update 3: Ravallion gains points for coining "regressionistas." 

07 March 2025

The RCT Bubble

Is it really a bubble? Whilst there has been rapid growth of impact evaluations of aid projects across agencies, and plenty of internet chatter, the vast majority of aid spending still does not get properly evaluated.  
For example, while there has been a substantial growth in impact evaluations of the World Bank development projects, only 8.8% of World Bank investment loans in 2009/10 had an impact evaluation. In 1999/00 the proportion was 2.4%.  ----- Martin Ravallion, World Bank Research Director

03 January 2025

Everything you ever wanted to know about RCTs and Microfinance but were too afraid to ask


If you’re looking for a thorough, careful, and clear summary of the latest research, and one that gets to those important design questions, you can do no better than read this review. My own chapter 6 is not nearly as complete and precise. --- David Roodman