Several months ago I described the problems in a study that seemed to have great policy relevance, but little empirical support for its contentions. Sadly, examples of studies like these abound in education, and another is currently making headlines. “Vouchers Boost Blacks’ College Enrollment Rates,” claim the stories– and boy do the effects seem large! A “24 percent increase” in college attendance among black recipients of those vouchers– what a dream. And it must be an accurate statement, right, since this was an experiment?
Well, not necessarily.
Too many practitioners, policymakers, and even researchers are far too inclined when they hear the words “randomized trial” to ignore the usual concerns about the reliability and validity of estimated program effects. After all, it’s “gold standard,” and touted by the Institute of Education Sciences as being the most valid to get a sense of how well programs work. Unfortunately, its usefulness is a bit more limited than that– first, experiments don’t always work as planned in creating equivalent initial groups for later comparison, and second, they often tell us only how well the intervention worked under a set of very specific conditions and circumstances, that are often crucial but rarely described in detail. Moreover, unless they are really carefully planned in advance, their post-hoc analyses can get particularly squirrelly when it comes to estimating different effects for different people.
For these reasons, I’m not sharing in the wild enthusiasm over the new Brookings study by Paul Peterson and Matt Chingos that purports to show that vouchers provide a big boost to college attendance to a very at-risk group: African-Americans.
I started laying out these concerns a few days ago via Twitter, but am restating and summarizing them here, in case it’s useful to those who don’t spend all of their time obsessing about methodology and need to know what really works in education.
Here are three reasons why the findings don’t pass my sniff test:
(1) The estimated average treatment effect of offering the voucher is null. Since the effects of receiving the voucher is positive and large for one group- African-Americans– this implies that the effects must be negative for another group, and yet this is never mentioned. Why? It’s rather unusual to show effects for only selected groups, and not for all of them. Most importantly, it goes against best practices.
(2) The only subgroup with effects, African-Americans, is a group that doesn’t seem to have equivalent treatment and control groups before the offer of the voucher. If anything, the treatment group students seem more inclined to college attendance independent of the voucher, given that more of their parents have bachelor’s degrees (while other factors are also imbalanced, this one is a known drive of college attendance, among the most important). While the authors attend to this issue a bit, and try one kind of sensitivity analysis to adjust for it, in their text they fail the potential flaws all of the cautions they deserves– even going so far as to making this finding the main highlight of the paper.
(3) In the paper and the press the authors stress the effects of receiving a voucher but voucher receipt is not randomly assigned. So if you are excited about the experimental component– in a study that claims to be “The first to measure the impact of school vouchers on college enrollment” — you need to know that the main result (for example, see paragraph 1 here) isn’t experimental. This is a quasi-experimental approach and is subject to the usual kinds of errors.
Are these flaw par for the course, and thus no big deal? I don’t think so. There was an evident PR effort behind this report, and it’s led to widespread reporting of a study that really needs more vetting. Words like “the Brookings Institution at Harvard” (sidenote: huh?) give it more credibility than it deserves at this stage, and the term “experiment” makes folks feel overly confident in the results.
Now, all that said, I do understand how these things can happen. Since they suggest differential responsiveness to programs (and thus the potential for cutting costs while increasing program effectiveness), subgroup analyses are quite seductive and compelling, as are randomized trials themselves. Last year, my colleagues and I wrote about some tentative findings from our study of financial aid that suggested effect heterogeneity. Prior to the release, we extensively vetted those findings with colleagues, and ran at least five different sensitivity analyses. After publication of the working paper, which we were careful to describe as “in progress,” we sought even more feedback and advice– and got a crash course in the enormous difficulty in disentangling effect heterogeneity from heterogeneous treatments. Truth is, the work is still ongoing. And that’s an incredibly important and valuable part of the research process, and one we all should wish and allow for– it makes the work better.
So, here’s to hoping that this is what will happen next for this voucher study. Instead of rolling full steam ahead thinking vouchers will magically boost college attendance for black students everywhere, let’s support the authors working through all potential alternative explanations for these odd results, and then replicating their experiment. Again, my own experience suggests replication is critical, revealing the processes and contexts under which effects occur and are more reliable. We should all demand it, especially from high-profile studies like these.