What We Know Now (and How It Doesn’t Matter)

Randy Olson’s Flock of Dodos (2006) explores the evolution and Intelligent Design (ID) debate that represents the newest attack on teaching evolution in U.S. public schools. The documentary is engaging, enlightening, and nearly too fair considering Olson admits upfront that he stands with scientists who support evolution as credible science and reject ID as something outside the realm of science.

Olson’s film, however, offers a powerful message that rises above the evolution debate. Particularly in the scenes depicting scientists discussing (during a poker game) why evolution remains a target of political and public interests, the documentary shows that evidence-based expertise often fails against clear and compelling messages (such as “teach the controversy”)—even when those clear and compelling messages are inaccurate.

In other words, ID advocacy has often won in the courts of political and public opinion despite having no credibility within the discipline it claims to inform—evolutionary biology.

With that sobering reality in mind, please identify what XYZ represents in the following statement about “What We Know Now”:

Is there a bottom line to all of this? If there is one, it would appear to be this: Despite media coverage, which has been exceedingly selective and misrepresentative, and despite the anecdotal meanderings of politicians, community members, educators, board members, parents, and students, XYZ have not been effective in achieving the outcomes they were assumed to aid….

This analysis is addressing school uniform policies, conducted by sociologist David L. Brunsma who examined evidence on school uniform effectiveness (did school uniform policies achieve stated goals of those policies) “from a variety of data gathered during eight years of rigorous research into this issue.”

This comprehensive analysis of research from Brunsma replicates the message in Flock of Dodos—political, public, and media messaging continues to trump evidence in the education reform debate. Making that reality more troubling is that a central element of No Child Left Behind was a call to usher in an era of scientifically based education research. As Sasha Zucker notes in a 2004 policy report for Pearson, “A significant aspect of the No Child Left Behind Act of 2001 (NCLB) is the use of the phrase ‘scientifically based research’ well over 100 times throughout the text of the law.”

Brunsma’s conclusion about school uniform policies, I regret to note, is not an outlier in education reform but a typical representation of education reform policy. Let’s consider what we know now about the major education reform agendas currently impacting out schools:

Well into the second decade of the twenty-first century, then, education reform continues a failed tradition of honoring messaging over evidence. Neither the claims made about educational failures, nor the solutions for education reform policy today are supported by large bodies of compelling research.

As the fate of NCLB continues to be debated, the evidence shows not only that NCLB has failed its stated goals, but also that politicians, the media, and the public have failed to embrace the one element of the legislation that held the most promise—scientifically based research—suggesting that dodos may in fact not be extinct.

* Santelices, M. V., & Wilson, M. (2010, Spring). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Educational Review, 80(1), 106-133.; Spelke, E. S. (2005, December). Sex differences in intrinsic aptitude for mathematics and science? American Psychologist, 60(9), 950-958; See page 4 for 2012 SAT data: http://media.collegeboard.com/digitalServices/pdf/research/TotalGroup-2012.pdf

NFL again a Harbinger for Failed Education Reform?

During the impending NFL strike in 2011—the act of a union—I drew a comparison between how the public in the U.S. responds to unionization in different contexts:

“I am speaking about the possible NFL strike that hangs over this coming Super Bowl weekend: a struggle between billionaires and millionaires, which, indirectly, shines an important light on the rise of teacher and teacher union-bashing in the US. Adam Bessie, in Truthout, identifies how the myth of the bad teacher has evolved.”

Once again, the NFL is facing a situation that I believe and even hope is another harbinger of how education reform can be halted: A suit filed by the family of Junior Seau:

“The family said the league not only ‘propagated the false myth that collisions of all kinds, including brutal and ferocious collisions, many of which lead to short-term and long-term neurological damage to players, are an acceptable, desired and natural consequence of the game,’ but also that ‘the N.F.L. failed to disseminate to then-current and former N.F.L. players health information it possessed’ about the risks associated with brain trauma.”

This law suit has prompted a considerable amount of debate concerning whether or not the NFL as we currently know it could be dramatically reconfigured under the pressure of more law suits. In other words, the inherent but often ignored or concealed dangers of football are now being exposed by legal action, in much the same way as the tobacco industry was unmasked and thus the entire culture of smoking has radically changed in the last couple decades.

With the release of the Education Policy Analysis Archives (EPAA) Special Issue on “Value-Added Model (VAM) Research for Educational Policy,” a similar question should now be raised about the future of implementing high-stakes accountability policies that focus on teacher evaluation and retention through VAM-style metrics.

“High-Stakes Implementation of VAM,…Premature”

Two articles in the special issue from EPAA examines the validity and reliability of VAM-based teacher evaluation in high-stakes settings and then places these policies in the context of legal ramifications faced by districts and states for those policies.

“The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era” (Baker, Oluwole, & Green, 2013) identifies the current trend: “Spurred by the Race-to-the-Top program championed by the Obama administration and a changing political climate in favor of holding teachers accountable for the performance of their students, many states revamped their tenure laws and passed additional legislation designed to tie student performance to teacher evaluations” (p. 3). Because of the political and public momentum behind reforming teacher evaluation, Baker, Oluwole, and Green seek “to bring some urgency to the need to re-examine the current legislative models that put teachers at great risk of unfair evaluation, removal of tenure, and ultimately wrongful dismissal” (p. 5).

While Baker, Oluwole, and Green offer a detailed and evidence-based examination of the VAM-based and student growth model approaches to high-stakes teacher accountability, they ultimately place the weaknesses of reform policies in the context of potential challenges from teachers who believe they have been wrongfully evaluated or dismissed:

“In this section, we address the various legal challenges that might be brought by teachers dismissed under the rigid statutory structures outlined previously in this article. We also address how arguments on behalf of teachers might be framed differently in a context where value-added measures are used versus one where student growth percentiles are used. Where value-added measures are used, we suspect that teachers will have to show that while those measures were intended to attribute student achievement to their effectiveness, the measures failed to do so in a number of ways. That is, where value-added measures are used to assign effectiveness ratings, we suspect that the validity and reliability, as well as understandability of those measures would need to be deliberated at trial. However, where student growth percentiles are used, we would argue that the measures on their face are simply not designed for attributing responsibility to the teacher, and thus making such a leap would necessarily constitute a wrongful judgment. That is, one would not necessarily even have to vet the SGP measures for reliability or validity via any statistical analysis, because on their face they are invalid for this purpose.”

The analysis ultimately discredits both the use of narrow metrics to determine teacher quality and the high-stakes policies being implemented using those metrics, concluding with the ironic consequences of these policies: “Overly prescriptive, rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights. This is likely despite the fact that much of the policy impetus behind these new evaluation systems is the reduction of legal hassles involved in terminating ineffective teachers” (pp. 18-19).

In “Legal Issues in the Use of Student Test Scores and Value-added Models (VAM) to Determine Educational Quality” (Pullin, 2013), the rapid increase of VAM-based accountability is further examined in the context of “a wide array of potential legal issues [that] could arise from the implementation of these programs” (p. 2).

Pullin notes the motivation for reforming teacher evaluation:

“VAM initiatives are consistent with a highly publicized press from the business community and many politicians to make government services more like private business, data-driven to measure productivity and accountability (Kupermintz, 2003). VAM approaches are in part a response to concerns that the current system of selecting and compensating teachers based their education and credentials is insufficient for insuring teacher quality (Corcoran, 2011; Gordon, Kane & Staiger, 2006; Hanushek & Rivkin, 2012; Harris, 2011). There have been increasing expressions of concern that teacher evaluation practices are not robust and do not improve practice (Kennedy, 2010). In the contemporary public policy context, much of the support for the use of student test scores for educator evaluation comes from a concern that the current system for evaluation is ineffective and that the current legal protections for teachers are too cumbersome for schools seeking to terminate teachers (Harris, 2009, 2011).”

While a business model for addressing quality control of a work force may seem efficient, Pullin highlights that legal ramifications are likely with these new models.

Pullin’s analysis offers a detailed and useful examination of previous court cases involving the use of test scores to evaluate educators, including recent cases involving VAM, concluding that the picture is not clear on how the courts may rule in the future, but that a pattern exists of “heavy judicial deference to state and local education policymakers and the allure of using test scores to make decisions about education quality” (p. 5).

Further, Pullin notes “there are differences of perspective among social scientists about VAM and the defensibility of using it to make high-stakes decisions about educators,” further complicating the concerns of legal action (p. 9).

While raising many other complications, Pullin also notes that students and parents may enter legal battles using VAM metrics “to substantiate their own legal claims that schools are not meeting their obligations to provide education” (p. 14).

Pullin concludes with a sobering look at teacher quality reform built on VAM and implemented in high-stakes environments:

“In the broad contemporary public policy context for education reform, the desire for accountability and transparency in government, coupled with heavily financed criticisms of public school teachers and their unions, may mean that VAM initiatives will prevail. The concerns of education researchers about VAM, coupled with legal obligations for the validity and reliability of education and evaluation programs should require judges and education policymakers to take a closer look for future decision-making. At the same time, the social science research community should be generating substantial new and persuasive evidence about VAM and the validity and reliability of all of its potential uses. For public policymakers, there are strong reasons to suggest that high-stakes implementation of VAM is, at best, premature and, as a result, the potential for successful legal challenge to its use is high. The use of VAM as a policy tool for meaningful education improvement has considerable limitations, whether or not some judges might consider it legally defensible.” (p. 17)

Like the NFL, federal and state governments may soon be compelled to reform the reform movement under the threat of legal action from a variety of stakeholders since the science of teacher evaluation remains far behind the curve of implementation, particularly when teacher evaluation is high-stakes and based on VAM and other metrics linked to student test scores.

The special issue from EPAA is yet another call for political leadership to pause if not end wide-scale teacher evaluation and retention models that pose legal, statistical, and funding challenges that those leaders appear unwilling to acknowledge or address.

Teacher Quality Mania: Backward by Design

Let’s return to the allegory of the river.

Throughout the Land, people discovered babies floating in the river. A few were chosen to save those babies. While many survived, too many babies perished.

Technocrats, Economists, and Statisticians gathered all the Data that they could and discovered that at least 60% of the reason the babies survived or perished in the river was due to babies being tossed in the river; about 10-15% of the reason babies survived or perished was due to the quality of those trying to save babies in the river.

So the Leaders of the Land decided to focus exclusively on increasing the quality of those trying to save the babies floating in the river, saying, “There is nothing we can do about babies being tossed in the river, and there are no excuses for not saving these babies!”

And so it goes…

While this altered tale above reads like a dystopian allegory, it is a fair and accurate portrayal of the current mania to address teacher quality—a mania that simply has the entire reform process backward.

First, the body of research shows a clear statistical pattern about the array of factors influencing measurable student outcomes, as summarized by Di Carlo:

But in the big picture, roughly 60 percent of achievement outcomes is explained by student and family background characteristics (most are unobserved, but likely pertain to income/poverty). Observable and unobservable schooling factors explain roughly 20 percent, most of this (10-15 percent) being teacher effects. The rest of the variation (about 20 percent) is unexplained (error). In other words, though precise estimates vary, the preponderance of evidence shows that achievement differences between students are overwhelmingly attributable to factors outside of schools and classrooms (see Hanushek et al. 1998Rockoff 2003Goldhaber et al. 1999Rowan et al. 2002Nye et al. 2004).

When educators and education researchers note that teacher quality is dwarfed by other factors, primarily out-of-school factors associated with affluence and poverty, Corporate and “No Excuses” Reformers respond with straw man arguments that quoting statistical facts is somehow saying teachers cannot have an impact on students or that quoting those facts is simply an excuse for not trying to educate all students (see Larry Ferlazzo and Anthony Cody for examples of this phenomenon in the debate over teacher quality).

To be clear, however, the problem is not that teacher quality doesn’t matter or that teachers do not want to be evaluated or held accountable. The problem is that addressing in a single-minded way teacher quality is self-defeating since (as the altered allegory above shows) it has the priorities of reform backward.

Teacher quality reform should occur, but it must come after the primary factors impacting learning and teaching conditions are addressed, thus making it possible to make valid and reliable evaluations of teacher quality. That process should be:

(1) Address first and directly the inequity of opportunity in the lives of children to create the conditions within which schools/teachers can succeed and thus school and teacher quality can be better evaluated and supported. As stated in a recent review of misleading “no excuses” and “miracle” school claims: “Addressing out-of-school factors is primary and fundamental to resolving education inequality” (Paige, 2013, January).

(2) Address next equity and opportunity within schools. Teaching conditions must be equitable in all school and for all students. Currently, affluent and successful students have the most experienced certified teachers and also sit in AP and IB classes with low student/teacher ratios while poor and struggling students have new and un-/under-certified teachers, sitting in high student/teacher ratios classes that are primarily test-prep. Inequitable teaching/learning conditions actually mask our ability to identify quality teachers.

(3) And then, once out-of-school equity is addressed and then in-school equity is addressed focusing on teaching and learning conditions, teachers must be afforded autonomy; and finally, we can gather credible evidence to begin identifying valid teacher quality metrics to inform evaluating, supporting, and retaining teachers.

The first and second priorities can be implemented simultaneously and immediately, with the third priority delayed until conditions are equitable enough to make authentic assessments of teacher impact on student learning. [And regardless, everyone involved in teaching and learning can and must continue to teach as well as possible; that is a given.]

Current arguments that only teacher quality matters are neither statistically accurate nor an effective reform priority.

Current arguments that only teacher quality matters are a frantic effort to save the babies floating in the river while ignoring the real crisis of babies being thrown in the river in the first place.

Between Educational Research and the Public, a Cloud of Misinformation

Walt Gardner, blogging at Education Week, has posted “Esoteric Formulas and Educational Research,” concluding (with a focus on the complex formulas used in pursuit of value added methods of evaluating teachers):

The point is that we are too accepting of research that relies heavily on esoteric formulas. I want evidence to support conclusions about educational issues. But the evidence has to be understandable. Just as legal contracts now are increasingly written with consumers in mind, I hope that educational studies will do the same in the future. Taxpayers are entitled to know if students are being well taught, but they can’t make that judgment when they are given incomprehensible data.

I would suggest that the greatest problem related to educational research is that a cloud of misinformation exists between good educational research/data and the public; and that this cloud is created by political leaders, think tank advocacy groups, and the media [1] who all either do not understand stats or purposefully misuse stats. I also believe some see the world only through a technocratic lens (such as the pursuit of VAM)—also a huge failure of applying appropriate paradigms in different contexts. Larry Ferlazzo has recently cited Nate Silver, who recognizes VAM as misguided: “There are certainly cases where applying objective measures badly is worse than not applying them at all, and education may well be one of those.”

Democracy and the market both work best for the public good when the public and consumers are informed. Political leaders, think tanks, and the media do no one any good by continually being inept themselves (and dishonest) in the use and misuse of research to drive political agendas or advance their own brand.

Some excellent resources to confront how badly educational research is portrayed for the public see the following:

Bracey, G. W. (2006). Reading educational research: How to avoid getting statistically snookered. Portsmouth, NH: Heinemann.

Molnar, A. (2001, April 11). The media and educational research: What we know vs. what the public hears. Milwaukee, WI: Center for Education Research, Analysis, and Innovation. Retrieved from http://epsl.asu.edu/epru/documents/cerai-01-14.htm

Yettick, H. (2009). The research that reaches the public: Who produces the educational research mentioned in the news media? Boulder, CO and Tempe, AZ: Education and the Public Interest Center & Education Policy Research Unit. Retrieved from http://epicpolicy.org/publication/research-that-reaches

[1] See chapters on the media in two of my most recent books: Parental Choice? and Ignoring Poverty in the U.S.