Vouchers and College Attendance: Puzzling Findings Deserve Much Caution

August 28, 2012 | Blog

Several months ago I described the problems in a study that seemed to have great policy relevance, but little empirical support for its contentions.  Sadly, examples of studies like these abound in education, and another is currently making headlines.  “Vouchers Boost Blacks’ College Enrollment Rates,” claim the stories– and boy do the effects seem large! A “24 percent increase” in college attendance among black recipients of those vouchers– what a dream. And it must be an accurate statement, right, since this was an experiment?

Well, not necessarily.

Too many practitioners, policymakers, and even researchers are far too inclined when they hear the words “randomized trial” to ignore the usual concerns about the reliability and validity of estimated program effects.  After all, it’s “gold standard,” and touted by the Institute of Education Sciences as being the most valid to get a sense of how well programs work. Unfortunately, its usefulness is a bit more limited than that– first, experiments don’t always work as planned in creating equivalent initial groups for later comparison, and second, they often tell us only how well the intervention worked under a set of very specific conditions and circumstances, that are often crucial but rarely described in detail.  Moreover, unless they are really carefully planned in advance, their post-hoc analyses can get particularly squirrelly when it comes to estimating different effects for different people.

For these reasons, I’m not sharing in the wild enthusiasm over the new Brookings study by Paul Peterson and Matt Chingos that purports to show that vouchers provide a big boost to college attendance to a very at-risk group: African-Americans.

I started laying out these concerns a few days ago via Twitter, but am restating and summarizing them here, in case it’s useful to those who don’t spend all of their time obsessing about methodology and need to know what really works in education.

Here are three reasons why the findings don’t pass my sniff test:

(1) The estimated average treatment effect of offering the voucher is null.  Since the effects of receiving the voucher is positive and large for one group- African-Americans– this implies that the effects must be negative for another group, and yet this is never mentioned.  Why? It’s rather unusual to show effects for only selected groups, and not for all of them. Most importantly, it goes against best practices.

(2) The only subgroup with effects, African-Americans, is a group that doesn’t seem to have equivalent treatment and control groups before the offer of the voucher.  If anything, the treatment group students seem more inclined to college attendance independent of the voucher, given that more of their parents have bachelor’s degrees (while other factors are also imbalanced, this one is a known drive of college attendance, among the most important).  While the authors attend to this issue a bit, and try one kind of sensitivity analysis to adjust for it, in their text they fail the potential flaws all of the cautions they deserves– even going so far as to making this finding the main highlight of the paper.

(3) In the paper and the press the authors stress the effects of receiving a voucher but voucher receipt is not randomly assigned.  So if you are excited about the experimental component– in a study that claims to be “The first to measure the impact of school vouchers on college enrollment” — you need to know that the main result (for example, see paragraph 1 hereisn’t experimental. This is a quasi-experimental approach and is subject to the usual kinds of errors.

Are these flaw par for the course, and thus no big deal? I don’t think so.  There was an evident PR effort behind this report, and it’s led to widespread reporting of a study that really needs more vetting.  Words like “the Brookings Institution at Harvard” (sidenote: huh?) give it more credibility than it deserves at this stage, and the term “experiment” makes folks feel overly confident in the results.

Now, all that said, I do understand how these things can happen.  Since they suggest differential responsiveness to programs (and thus the potential for cutting costs while increasing program effectiveness), subgroup analyses are quite seductive and compelling, as are randomized trials themselves. Last year, my colleagues and I wrote about some tentative findings from our study of financial aid that suggested effect heterogeneity. Prior to the release, we extensively vetted those findings with colleagues, and ran at least five different sensitivity analyses.  After publication of the working paper, which we were careful to describe as “in progress,” we sought even more feedback and advice– and got a crash course in the enormous difficulty in disentangling effect heterogeneity from heterogeneous treatments. Truth is, the work is still ongoing.  And that’s an incredibly important and valuable part of the research process, and one we all should wish and allow for– it makes the work better.

So, here’s to hoping that this is what will happen next for this voucher study.  Instead of rolling full steam ahead thinking vouchers will magically boost college attendance for black students everywhere, let’s support the authors working through all potential alternative explanations for these odd results, and then replicating their experiment.  Again, my own experience suggests replication is critical, revealing the processes and contexts under which effects occur and are more reliable.  We should all demand it, especially from high-profile studies like these.


  1. Reply

    Stuart Buck

    August 29, 2012

    On the first point: For blacks (42% of the sample), the effect was positive and significant, but noisy. For Hispanics (another 45% of the sample), the point estimate was positive too, just insignificant and noisy as well. For the rest, the study doesn’t say, but say it was negative. What then? That doesn’t give one a reason to doubt the validity of the findings, which are consistent with the test score findings from earlier findings as to the same NYC program. (Note that several charter school studies find similar effects going in opposite directions, which seem like the perfect solution to the achievement gap!)

    On the second point: you say that among blacks, the treatment group seems more inclined to college attendance. This seems to be based solely on the fact that among the treatment group and control group, black parents had graduated from college at a rate of 16% and 12% respectively.

    But you omit quite a bit here: Among control group blacks, 47% of the parents had “some college,” while among treatment group blacks, merely 42% of parents had “some college.” Add the two together, and 49% of control group parents but 48% of treatment group parents had some college or a college degree. Seems fairly equal. Why would you think that because treatment group parents were very slightly more likely to have “some college” rather than a degree, this would somehow render suspect the finding that their children were more likely to have some college as well? (College enrollment, not graduation, is what this study was examining.)

    On top of that, treatment group parents seem to have been more likely to have dropped out of high school: 15% of them only had “some high school” rather than a high school degree, whereas 11% of control group parents had only “some high school.”

    Corresponding to this, 42% of the treatment group had “father absent,” compared to 36% of the control group (again, speaking just about blacks here).

    Granted, the original methodology should have used pair matching so that such discrepancies would have arisen, but given the data that exist now, why focus on the single variable where one can tease out a treatment group advantage, while ignoring several other variables that point the other way? Why wouldn’t these balance out, at a minimum, if not disfavor the treatment group?

    On the third point, the impact of actually using a voucher is not randomly assigned, yes, but the effect isn’t that much greater than the impact of being offered a voucher, which was indeed randomly assigned, and which is mentioned throughout the study.

    Finally, I do join with your conclusion calling for more vouchers and more studies.

  2. Reply

    Sara Goldrick-Rab

    August 29, 2012

    Stuart, some responses:

    (1) I have never seen the results of a randomized trial hedge so much as these authors do. Talking about effects that are positive but noisy? Noisy means something-- unless you are claiming post-hoc they were underpowered analyses?

    (2) Of course it matters if part of the effect is negative-- it means that a positive for one group came at some expense for another group. I assume we want to do no harm, to anyone?

    (3) Most recent studies show a nonlinearity with a major additional return for both college attendance and completion accruing to students whose parents completed a 4-year degree- not just some college. This makes sense, since those are the parents with higher economic returns, more stability, and more social and cultural capital. Unlike this consistent set of findings, "father absent" bears a smaller influence on college attendance-- typically the big issue is presence of the mother. This is what I kept telling you and Matt on Twitter-- the issue isn't merely balance of observables, but also the importance of those imbalanced observables for the outcomes of interest (Chris Taber's work elaborates on this). That's why I focus on this key variable-- it's not like you simply "add up" all imbalances and decide they "balance out."

    (4) If the TOT isn't much different than the ITT, and the ITT is a cleaner estimate, then it should be the one reported.

    (5) I'm definitely not calling for more vouchers. But I am calling for more studies.

  3. Reply

    Stuart Buck

    August 29, 2012

    Well, you can't do studies of the long-term effects of vouchers without having vouchers around long-term on which to do the analyses!

    Noisiness can just be heterogeneity of impact, not lack of power, right? Lack of power implies that there is a single true effect for all people and you just need to estimate it more precisely, but that seems to be an unlikely assumption in almost all interesting policy discussions.

    Anyway, even if you assume that kids with college graduate parents are 100% guaranteed to enroll in college (which seems optimistic, no?), and kids with "some college" parents are only 50% likely to enroll in college, it's hard to see how that 4-point discrepancy could possibly explain away very much of the 7.1-point ITT impact.

Would you like to share your thoughts?

Would you like to share your thoughts?

Leave a Reply

© 2013 The EduOptimists. All Rights Reserved.