17 April 2019

Coaching is better than training, but there is still a questionmark on scalability

"So should governments switch to frequent coaching sessions? Possibly, but the next step should first be to try this type of intervention at scale. 
Finding three highly skilled coaches is one thing, but you might need hundreds or thousands of them if you were to run a similar programme across an entire country. 
One potential route to scale is through new uses of technology. A study in Brazil found positive impacts of a virtual-coaching programme run via Skype, for example. 
But perhaps the most straightforward type of technology to go for is scripts, which this paper suggests have positive effects on learning both when presented through centralised training and intensive coaching."

12 March 2019

"Maybe one of the most cost-effective interventions ever studied"

In this month's TES column (I'm calling it a column, it sounds better than a blog), I call parent-teacher meetings in Bangladesh "maybe one of the most cost-effective interventions ever studied". Here's the maths behind that claim. 

First, the intervention found 0.377 standard deviation effect on Grade 5 scores and 0.141 standard deviation (not statistically significant) effect on Grade 3 scores. If we take the average of those, that is 0.259. That's equivalent to around 1.7 extra years of school (based on Evans & Yuan's estimate that 1 standard deviation ~ 6.5 years of school).

The cost was $3 per student over the two years. The author Asad Islam does the conversion using only the 0.377 effect size for Grade 5, writing "Thus, the cost per average 0.1 SD increase in test scores per student is $0.66 or $1.58 for the full program over 2 years."

J-Pal put together a list of the cost-effectiveness of different interventions on their website, now gone, but replicated by Romero, Sandholtz, & Sandefur in the Liberia Partnerships Schools paper (copied below). Islam's $1.58 per 0.1 SD increase is equivalent to 6.3 standard deviations per $100. If we use the more conservative estimate of 0.259 SD (averaging across Grade 5 and Grade 3 results) that still works out at 4.3 SD per $100 spent. That lower estimate still puts this intervention at third place in the ranking, so there you go: "maybe one of the most cost-effective interventions ever studied".


11 March 2019

The Latest Economics Research on Global Education

Last week I was at the Society for Research on Educational Effectiveness (SREE) conference. Alex Eble made a big and apparently successful push to increase representation by researchers focused on developing countries. In time-honoured Dave Evans style, here's my one-sentence roundup of 22 idiosyncratically selected studies presented at the conference. You can see the full programme here

---

Public-private partnerships

A subsidy for private schools in Haiti lead to higher enrolment (Adelman, Holland, and Heidelk) #Haiti

Chile has a universal school voucher and a higher voucher targeted at low-income students. The universal voucher is better for aggregate efficiency but worse for equity (Sanchez) #Chile #StructuralModel

Giving out vouchers to attend 5 years of low-cost private primary school in Delhi led to worse Hindi scores and no change in English or Maths (Crawfurd, Patel, and Sandefur) #India

Contracting out management of public schools to NGOs in Liberia led to a 60% increase in learning (Romero, Sandholtz, and Sandefur) #Liberia

School management

A mobile-phone based support programme for school councils in Pakistan led to no improvement for students (Asim) #Pakistan #Diff-in-Diff

A major school inspection reform in Madhya Pradesh led to no improvement in schools (Muralidharan and Singh) #India

Independent monitoring of teachers led to better student performance (Kim, Yang, Inayat) #Pakistan #Diff-in-Diff

Mindfulness

Mindfulness interventions reduced sadness and aggression of children in Niger (Kim, Brown, De Oca, Annan, Aber), improved concentration and prosocial behaviour in Sierra Leone (Brown, Kim, Annan, Aber), and increased prosocial behaviour amongst Syrian refugees (Keim and Kim) #Niger #SierraLeone #Syria

Information for parents

Giving parents information about their child’s performance led to some temporary improvements (Barrera-Osorio, Gonzalez, Lagos, Deming) #Colombia

Incentives for teachers 

The theoretically optimal “Pay for Percentile” incentive scheme works to increase effort, which is complementary to inputs (Gilligan, Karachiwalla, Kasirye, Lucas, Neal) #Uganda

BUT A simpler “threshold” incentive scheme can be as effective as the theoretically optimal “Pay for Percentile” (at least in the short-run) (Mbiti, Romero, Schipper) #Tanzania 

Methodology

Studies commissioned by the developer of an intervention find effect sizes 80% larger than studies commissioned independently (Wolf, Morrison, Slavin, Risman) #USA #MetaAnalysis #EvaluatorIndependence

Tests designed specifically for evaluations produce effect sizes 63% larger than generic tests (Pellegrini, Inns, Lake, Slavin) #USA #MetaAnalysis #TestDesign

External validity bias (non-random selection of schools into trials) is twice as big as internal validity bias (from using observational not experimental methods) (White, Hansen, Lycurgus, Rowan) #USA #ExternalValidity

Technology

The One Laptop Per Child programme in Peru had zero effect on learning (Cristia, Ibarrarán, Cueto, Santiago and Severín) #Peru

In addition, providing internet had no effect on student learning (Malamud, Cueto, Cristia, Beuermann) #Peru

Peer effects

Being the weakest student in a better (selective) school can be worse than being the strongest student in a worse school (Fabregas) #RDD #Mexico

Finance

Temporary subsidies can have permanent effects on enrolment (Nakajima) #Indonesia #Diff-in-Diff

Merit-based scholarships have bigger effects than need-based scholarships (Barrera-Osorio, de Barros, Filmer) #Cambodia

Heat


Each 1 degree Fahrenheit of school year temperature reduces learning by 1 percent. Air conditioning entirely offsets this. (Goodman, Hurwitz, Park, Smith) #FE #USA  

18 February 2019

Is testing good for education?

This post was first published on the Centre for Education Economics website. 
I blogged recently about a new RISE working paper by Annika Bergbauer, Eric Hanushek, and Ludger Woessmann, which finds that:
“standardized external comparisons, both school-based and student-based, is associated with improvements in student achievement.”
William Smith pointed me to his rebuttal blog written with Manos Antoninis, which argues that there are “multiple weaknesses in their analysis that undermine their conclusions”.

This blog is my attempt to make sense of the disagreement. The main issue appears to me to be a misunderstanding by Antoninis & Smith (“AS” from here on) of the mechanism proposed by Bergbauer, Hanushek, and Woessmann (BHW). AS presume that the main mechanism through which testing is hypothesised to improve outcomes is through school choice (allowing parents to shift their students to schools with better test scores) or through punitive government accountability for teachers and schools. But BHW make it clear that their main focus is on the principal-agent relationship between parents as the principal and both students and teachers as their agents. Parents can’t observe the effort made by students and teachers, but standardized testing can provide them with a proxy indicator for effort. This should induce greater effort from both students and teachers. This proposed mechanism has nothing to do with school choice or accountability from government.

First AS argue that
“Our review of the evidence found that evaluative policies promoting school choice exacerbated disparities by further advantaging more privileged children (pp. 49-52).”
This review of the evidence in pp 49-52 of the UNESCO Global Monitoring Report focuses on policies designed to promote school choice. But that is not at all the focus of the BHW analysis, which is on policies that allow for the comparison of schools and students with the purpose of incentivising greater effort. School choice doesn’t need to have anything to do with it. As BHW write:
“That is the focus of this paper: By creating outcome information, student assessments provide a mechanism for developing better incentives to elicit increased effort by teachers and students, thereby ultimately raising student achievement levels to better approximate the desires of the parents”
Second, AS argue that
“punitive systems had unclear achievement effects but troublesome negative consequences, including removing low-performing students from the testing pool and explicit cheating (pp. 52-56).”
As mentioned above, the proposed mechanism in BHW does not at all require a punitive system. BHW write
“accountability systems that use standardized tests to compare outcomes across schools and students produce greater student outcomes. These systems tend [my emphasis] to have consequential implications and produce higher student achievement than those that simply report the results of standardized tests.”
Having said that, there are some flaws in the literature review cited by AS. This section first cites studies on four individual countries (US, Brazil, Chile, South Korea), without noting that there are significantly positive results from two of them. One of the two papers they cite on Brazil (IDados 2017) concludes that there was “a large, continuous improvement in all those years in both absolute and relative terms when compared to other municipalities in the Northeastern region and in Brazil as a whole ” and “it is very likely that [the reform] is at least partially responsible for the changes.” On Chile, a paper not cited as it was published in 2017 just after the review was completed (Murnane et al) found that “On average, student test scores increased markedly and income-based gaps in those scores declined by one-third in the five years after the passage of [the reform]”.

Next the review cites two papers (Yi 2015; Gándara and Randall (2015) that present correlational analysis with no attempt to address any potential bias from omitted variables or reverse causality. The latter study is based on a small sub-sample of the fuller data used by BHW.

Next AS take issue with the way that BHW construct their 4 categories of test usage. For ease of reference I first reproduce below the 4 categories, along with the wording of the questions that go into constructing each category.

---
1. Standardized External Comparison
  • “In your school, are assessments of 15-year old students used to compare the school to district or national performance?” (PISA)
  • existence of national/central examinations at the lower secondary level (OECD, EAG)
  • National exams (primary) (Euryadice (EACEA))
  • Central exit exams end secondary (Leschnig, Schwerdt, and Zigova (2017))

2. Standardized Monitoring
  • “Generally, in your school, how often are 15- year-old students assessed using standardized tests?” (PISA)
  • “During the last year, have [tests or assessments of student achievement] been used to monitor the practice of teachers at your school?” (PISA)
  • “In your school, are achievement data … tracked over time by an administrative authority[?]”

3. Internal testing
  • whether assessments are used “to inform parents about their child’s progress.”
  • use of assessments “to monitor the school’s progress from year to year.”
  • “achievement data are posted publicly (e.g. in the media).” (vaguely phrased and is likely to be understood by school principals to include such practices as posting the school mean of the grade point average of a graduating cohort, derived from teacher-defined grades rather than any standardized test, at the school’s blackboard.)

4. Internal teaching monitoring
  • whether assessments are used “to make judgements about teachers’ effectiveness.”
  • practice of teachers is monitored through “principal or senior staff observations of lessons.”
  • “observation of classes by inspectors or other persons external to the school” are used to monitor the practice of teachers.
---

First, AS argue that question 3c should really fall under category 1. The effect of this question on outcomes is primarily statistically insignificant, though for Maths and Science the direction of the coefficients in the interacted model are the same as the other variables in category 1 (positive in the base model, with a negative coefficient on the interaction with initial score). Would adding this one variable to the 4 variables already in the category make the results statistically insignificant overall? I think probably not, but can’t say for sure without looking at the raw data.

Second, AS claim that question 4a should really fall under category 1 or 2. This claim seems debateable. The theoretical mechanism that BHW put forward is that providing credible information to parents induces greater effort from teachers. This use of testing is clearly internal to the school, and could clearly mean internal school assessments rather than necessarily standardized assessments that allow for external comparison with teachers at other schools.

Third, AS criticise the inclusion of high stakes student assessments as indicators, as by placing the stakes on students and not schools they do not relate to accountability from government. But this is not what BHW claimed was driving the effect.

Fourth, AS suggest the use standardized testing in grade 15 may be effectively “teaching to the test”. This seems odd to me - they clearly aren’t literally teaching to the test because it is a different test. BHW are looking at the effect of introducing high-stakes national standardized testing on student results in a totally separate, low-stakes sample-based test (PISA). AS then don’t really address the argument that “teaching to the test” can also be a positive thing if the test is well-designed and includes a good sample of the things that students are expected to have learnt.

Finally, AS focus only on those results that are statistically significant in the baseline model (estimating the average effect across all countries). However they miss a really important conclusion from the paper which is about heterogeneity. The effects of testing are largest for the weakest performing systems. This is clear in Figure 3.




Looking at the interacted model (Table A5), both of the other 2 questions in category 2 (2b and 2c) are statistically significant.

To sum up, there are weaknesses in the interpretation by AS of BHW which undermine their criticism. BHW focus on the role that testing can play in increasing the effort of students and teachers, with or without government accountability systems. In addition, the review of government accountability systems presented in the UNESCO Global Monitoring Report also has weaknesses, and presents an unduly negative picture. My prior remains that standardized testing plays a positive role, particularly in weak systems.

Thanks to Gabriel Heller-Sahlgren, William Smith, and Manos Antoninis for comments on a draft of this post. This acknowledgement clearly does not imply that Smith and Antoninis agree with this post - they don’t!

06 February 2019

CfEE Blogging: Giving students information on future wages improves school outcomes

As of this January and following last year's Annual Research Digest from the Centre for Education Economics, I'll be co-editing the Monthly Digest, along with Gabriel Heller-Sahlgren.

This is basically an excuse and commitment device to get me actually blogging again on at least a monthly basis. Each issue will include commentary on new papers, plus a selection of abstracts from recent publications (lightly edited for jargon).


My first comment is on a new paper by Ciro Avitabile and Rafael de Hoyos
Did you know what career you wanted to do when you were in secondary school? I didn’t. Most pupils make critically important choices that will affect their lives throughout their educational career, often on the basis of poor information about what those choices will mean for their future. In most countries, there is little transparency on the costs and benefits of pursuing education and information on the various career paths available. 
In this paper, Ciro Avitabile and Rafael de Hoyos study whether or not providing pupils with better information about the earnings returns to education and the options available lead to greater effort and learning. Several studies have previously shown that providing information about the wage gains from schooling leads pupils to stay in school a bit longer, and affects their educational choices, but there is limited evidence that such information can affect learning per se, at least in a slightly longer-term perspective.

16 January 2019

PubhD Kigali

For any readers in (or visiting) Kigali (presumably a niche audience), I've started a monthly research talk event, using the PubhD format that is going in around 20 European cities now.

3 speakers get 10 minutes each to present their research, followed by Q&A. It's a great way to learn a bit about some random subjects you might not have considered much before, for the speakers to practice their extended elevator pitch, and a pretty low-effort way of organising some kind of regular academic vaguely seminar-like discussion for me.

The next one is this Thursday at 7.30pm, see here for more details, and get in touch if you'd like to speak sometime.

Does temporary migration from rich to poor countries cause commitment to development?

Nevermind that none of the journals I've sent it to so far are interested, my new working paper got picked up by Marginal Revolution the king of economics blogs, which is probably way better anyway right?
Public support in rich countries for global development is critical for sustaining effective government and individual action. But the causes of public support are not well understood. Temporary migration to developing countries might play a role in generating individual commitment to development, but finding exogenous variation in travel with which to identify causal effects is rare. In this paper I address this question using a natural experiment – the assignment of Mormon missionaries to two year missions in different world regions – and test whether the attitudes and activities of returned missionaries differ. Data comes from a unique survey gathered on Facebook. Missionaries assigned to treat regions (Africa, Asia, Latin America) are balanced with those assigned to the control region (Europe) on high school test scores and prior language and travel experience. Those assigned to the treatment region report greater interest in global development and poverty, but no difference in support for government aid or higher immigration, and no difference in personal international donations, volunteering, or other involvement.
Here's the link to the paper and the twitter discussion

15 January 2019

Testing, testing: the 123's of testing

Here's my summary of the new Annika Bergbauer, Eric Hanushek, and Ludger Woessmann working paper for CFEE.
"teachers tend to oppose standardised tests, partly because they perceive them to narrow the curriculum and crowd out wider learning. However, it is intuitive that the effects of testing could vary dramatically by context. Indeed, the impact may very well follow a so-called “Laffer curve”. At low levels of testing, an increase may lead to better performance as it provides relevant information and incentives to actors in the education system. Yet if there are already high levels of testing, further increases may very well decrease performance, due to stress, for example, or the effects of an overly-narrowed curriculum. If so, we should expect the impact of testing to follow an inverted U-curve – or at the very least display diminishing returns. Furthermore, the impact of tests is also likely to depend on exactly how they are used in the education system. 
This paper provides perhaps the first systematic evidence on these issues"
Read the rest here.