Conditions v. Outcomes: More on What’s Wrong with Teacher Education (and Accountability)? pt. 2

After posting What’s Wrong with Teacher Education?, I received comments and responses that are fairly represented in the comments at the original post from Peter Smyth and psmagorinsky (Peter Smagorinsky). For full disclosure, these two Peters are acquaintances that I respect a great deal, and thus, take their comments quite seriously.

To Peter Smyth’s concern (voiced by a few others offering feedback), I can clarify that my original post is a rejection of certification and a call for the need for rich and deep education degrees; thus, my argument in no way endorses Teach for America or other alternative certification programs that inherently avoid and marginalize education degrees (which are in fact the antithesis of my argument).

Peter Smagorinsky’s comment—notably “At the same time, I think that if we are constructed as being against being accountable for our teaching, we not only lose the PR battle, we are dodging responsibility for the end result of our teacher education”—requires a bit more explanation so I ask that you allow me to offer a series of personal anecdotes to make my case.

The summer of 1975 was traumatic for me and my family since I was diagnosed with scoliosis, requiring my parents to pay for and me to wear an elaborate and expensive back brace. This ordeal lasted from my 9th through my 12th grades.

Setting aside the personal angst from wearing a large back brace during my gangly and painfully self-conscious teen years, I have detailed that this experience with scoliosis became the breeding ground for my extensive comic book collection as well as many hours and years spent teaching myself to draw.

Since the brace made sitting nearly impossible, I began to stand at the end of the long bar that separated my family’s kitchen and living room. There I at first traced my favorite comic book superheroes from my collection; soon I began drawing freehand. Eventually, I was drawing large portraits of entire comic book panels and dramatic scenes—first carefully creating the artwork in pencil and then inking the works reflecting the comic book process (I even did some coloring over the years, again mimicking the comic book industry).

Over about 5 or 6 years, I became a fairly accomplished artist, branching our beyond comic book artwork to realistic pencil drawings (often from photography). For the purposes of this blog post, I want to emphasize that at no point did I ever have any formal courses, no teacher of any kind related to being a visual artist.

I read and studied comic books, I researched how comic book art was created, and I bought a few art books, mostly large books of sketches to use as practice.

Overlapping my teenage years spent collecting comic books, teaching myself to draw, and contemplating a career as a comic book artist, I grew up on a golf course, where I worked (both in the club house as an assistant and at the pool as a lifeguard). I also spent many hours of my life hitting range balls (often 300 at a time) and playing 18-27 holes of golf many days each week.

Yes, I also contemplated the life of the professional golfer.

While in college, I secured an assistant pro job at a different golf course, where I spent a good deal of time talking with two professional golf instructors. These men gave golf lessons on the course driving range and sometimes on the course itself.

One golf pro had never had a career as a touring pro, and I was able to shoot scores similar to his. The other had briefly played on the tour in the Ben Hogan era, but his promise of a tour career was cut short by a car accident.

From talking with these two golf instructors and watching their work and their students, I recognized something incredibly important: Most of the people taking the lessons essentially stayed about the same in their ability to score on the golf course. The older golf instructor often said directly to me that he could teach anyone the proper grip and motions in a golf swing, but that beyond that, the outcomes of how any person played golf was really not something he could teach or control.

With learning to play golf, technique, physical aptitude, practice, and such were all intricately intertwined. Few people practiced or played as much golf as I did in those years, and I was never going to be a touring professional. Never. (Likely too, I was never going to be a professional comic book artist.)

About twenty years after those teen and early 20s years, I had become a public school English teacher; my life was steeped in reading and writing (now traceable to those comic books I was also reading voraciously along with science fiction).

A few years after receiving my EdD, I was fortunate to be the lead instructor for the Spartanburg Writing Project in their summer institutes for teachers. In that first summer, a beginning teacher, Dawn Mitchell (who would go on to teach and work for SWP as well as adjunct where I now work in teacher education), and I began working on her efforts to write poetry. Dawn was a wonderful teacher, a gifted writer of prose, and an eager as well as frequent reader.

When I read her poem drafts, however, I felt she had not attained the same genre/form awareness about poetry that she displayed about prose.

I had been writing poetry since my freshman year of college, had published a fair number of poems (see “horea,” “Mary (sea of bitterness),” and “quilting”), and had been teaching high school students to write poetry for almost twenty years. Four of my high school students’ poems were included in one of my earliest articles in English Journal, in fact (see Ashley Mason and Leigh Hix here; Lauren Caldwell and Kris Harrill here).

The summer institute workshop format allowed Dawn and me an ideal opportunity for examining how to develop poetic sensibilities. And Dawn’s work as a poet soon rose to the fine level of her prose.

While Dawn was growing as a writer and poet, I too was learning to hone my craft not as a poet, but as a teacher of writing poetry—developing the ability to mine craft from reading poetry and helping writers transfer those craft lessons into their original work.

Of the many things I teach, I remain convinced teaching someone to write poetry is possibly my most refined skill.

That said, I cannot claim ever that I can produce a poet from that teaching as acts themselves that must be viewed as their own evidence of quality.

What does all this have to do with what’s wrong with teacher education, broadly, and Smagorinsky’s concern, narrowly?

First, teachers and formal teaching are important, but not necessary or easily defined, aspects of learning, especially as that learning manifests itself in some observable outcomes—as my learning to draw is but one example.

Thus, seeking to identify direct, isolated, and causational relationships among teachers, teaching, learning, and observable learning outcomes is simplistic and a fundamental misrepresentation of each of these.

No teacher can be involved when a learner produces outstanding outcomes. A poor teacher can be involved when a learner produces outstanding outcomes. And a brilliant teacher can be involved when a learner produces weak outcomes.

Why?

Because a teacher of anything has control only over the conditions of the learning experience—as my second and third example are offered as evidence.

Golf instructors and teachers of writing poetry can never promise skilled golfers or brilliant poets. Many other elements besides the teachers or the teaching are involved—and such is the case with all teaching and learning.

And therein lies my essential disagreement with continuing to focus on learner outcomes when seeking accountability for teachers and teaching.

How did I teach myself to draw? All of the conditions necessary were provided or occurred—incredibly supportive parents who bought the comic books and art supplies, my own unfortunate situation with scoliosis, my fortuitous discovery of a proclivity for visual art, and my own intrinsic motivation that fueled my hours and hours of practice. (By the way, I think I would have benefited greatly from a professional teacher, but the conditions in which I taught myself are evidence of how important conditions are in contrast to a teacher.)

In the larger picture, however, elite golfers, visual artists, and poets cannot be taught to be elite. A substantial number of unpredictable elements are involved, and direct teaching and teachers are important but not even necessary.

Learner outcomes are simply not credible artifacts for teacher or teaching quality.

Teacher education (and teaching accountability) must set aside that paradigm of accountability, and begin to focus on the conditions of teaching instead.

Admitting that teacher education cannot guarantee teacher quality from their programs is not a cop-out. It is the same as the golf instructor who despite his best efforts cannot guarantee golfer quality, the teacher of poetry who cannot guarantee a poet.

By continuing to pretend that teacher quality is the most important element in student learning, we are in fact devaluing and misrepresenting the importance of teachers and teaching.

What’s Wrong with Teacher Education?

I belong to two communities that are central to my life—educators and cyclists.

So when a cyclist and friend sent me an article on the importance of how cyclists conduct themselves as groups on the roads, I was struck by the opening quote included by the writer, Richard Fries:

“We have met the enemy and he is us.”  –Walt Kelly, Pogo

Immediately, the spirit of the article—many times motorist antagonism toward cyclists can be traced to cyclist behavior—resonated with me as someone who has been cycling seriously for about 30 years, including a great deal of time and effort spent posting and leading group rides. But the sentiment of this piece on group cycling also spoke to me as a teacher and teacher educator because when I ponder what is wrong with teacher education, I notice that the enemy is often us—teachers and teacher educators.

Gerardo M. Gonzalez, dean of the school of education at Indiana University Bloomington, examines the current state as well as the political and public perception of teacher education in Defining Teacher-Prep Accountability:

Much has been written and discussed of late about the debate over the best method of assessing teacher-preparation programs. As the dean of the school of education at Indiana University Bloomington, I understand that meaningful assessment of teacher preparation requires a multifaceted approach based on a robust research methodology and focused on program outcomes. A sound study, as researchers know, begins with a viable research question. The design and method of data collection then flow from that question. Moreover, the scientific validity of conclusions reached on the basis of the data depends on the ethical application of research principles and, where appropriate, validation of results through peer review and replication.

Two important aspects of Gonzalez’s commentary occur in the opening: He acknowledges the impact and influence of National Council on Teacher Quality (NCTQ) and then takes a firm stand against NCTQ’s reports and methodologies.

NCTQ’s reports have received essentially free passes by the mainstream press, but have been discredited in detail among researchers, educators, and bloggers. That dynamic is a powerful picture of the larger context of what is wrong with teacher education.

First, teacher education (like public schools and public school teachers) is not failing in the ways claimed by NCTQ—or other think tanks, political leaders and appointees, and the mainstream media.

Second, the noise created by NCTQ and others promoting misinformation masks the very real ways in which teacher education is failing (and, again, this parallels a similar pattern found in education reform more broadly; see An Alternative to Accountability-Based Education Reform).

While I applaud Gonzalez and Indiana University for taking a politically unpopular but credible and evidence-based stance against NCTQ (too few in teacher education did take that stand), the last part of Gonzalez’s commentary reveals just what is wrong with teacher education.

In the outline offered by Gonzalez, accountability based on standards and outcomes is, once again, reinforced:

If I were to design a study to hold preparation programs accountable for their graduates’ performance, as the group Teach Plus Indianapolis has challenged me to do, I would start with the question of whether a given teacher-preparation program produces graduates who can work effectively in school classrooms to increase student learning and achieve other valued educational outcomes. Then, I would select or create appropriate measures of student learning and related educational outcomes, as well as ways to assess teacher effectiveness on the impact of those measures.

And therein lies the problem.

What’s wrong with teacher education? In brief, the problem with teacher education is the maze of bureaucracy that constitutes certification and accreditation.

And that maze of standards (and the perpetual changing of those standards) feeds a misguided overarching paradigm: accountability linked to outcomes.

In both education reform and teacher education, accountability is misguided and it causes more harm than good—notably because the traditional accountability paradigm seeks to hold one agent accountable for the outcomes of other agents, whether that be teachers accountable for student test scores or colleges/departments of education accountable for the student test scores of their candidates.

That accountability fails because the focus is on outcomes, and those outcomes are outside the control of the agent being held accountable.

Additionally, since that accountability is flawed, those agents being held accountable are reduced to documenting meticulously that they have served the standards as a defense against their inability to control the outcomes.

The result is dysfunctional because too much of both teacher educator’s and educator’s time is spent correlating their lessons and assessments with standards (and not enough time preparing by studying the content of their field and the needs of their students), and then wasting a tremendous amount of time completing the external mandates related to certification and accreditation.

Gonzalez mentions the Council for the Accreditation of Educator Preparation (CAEP)—which ironically represents the fundamental flaw with the entire accreditation process since this organization is a new version of two earlier accreditation organizations. Accreditation (like certification) is a minefield of every-moving targets, a bureaucratic process for the sake of being bureaucratic. In fact, the only constant in the worlds of certification and accreditation is that both perpetually change—always in pursuit of the right (or next) standards.

CAEP will no better serve teacher education than Common Core will save K-12 public education. We have decades of evidence that these processes have never worked, and we have no evidence that anything different will happen this time around (except the new elements, such as VAM, are guaranteed to increase the harm).

Again, the failure of teacher education is in the bureaucracy of accountability, standards, and focusing on outcomes. The solution, then, would be for teacher education to embrace the foundational aspects of the disciplines.

I have stated this before, but it is worth repeating: Every moment I have spent achieving certification has been a waste of my time; every moment spent in rich and engaging education courses and programs has been infinitely valuable. For example, the road to certification as an undergraduate was disappointing (except for some excellent professors), and that contrasts strongly with my doctoral program (including no certification requirements), which was the single most important element in my path to being an educator.

As an undergraduate, I learn to be a bureaucrat; as a graduate student, I learned to be a scholar.

I think even the best among us in the field of education remain trapped in a low self-esteem mindset: we are afraid, because we know this is what other disciplines say about education, that we are in fact not a real field of study; therefore, we manufacture the most complex systems imaginable to make our field seem valuable, “rigorous,” professional. And thus:

“We have met the enemy and he is us.”  –Walt Kelly, Pogo

Certification and accreditation are mind-numbingly complicated, I fear, as a sort of low-self-esteem theater. The maze of standards, rubrics, data charts, and reports surely proves that we are a complex field, that we are working hard?

Two things about that are nonsense: (1) all the bureaucracy of certification and accreditation confirms the worst slurs against education as a field, and (2) the field of education is a rich and credible discipline, if only we’d trust that and embrace it.

So allow me to end with an anecdote.

As an 18-year teacher of high school English, I entered higher education and teacher education. Soon afterward, I asked if I could be spared to teach an occasional freshman composition course (my first love). Although the politics of an education professor (with an EdD, no less) teaching in the English department were more treacherous than I anticipated, I was finally allowed one section.

When I met with the English department chair to discuss the course, I asked to see a sample syllabus. The chair, at first, seemed puzzled, but he did shuffle through his desk and around his office until he found a couple.

One syllabus was the front of one page, and the other, the front and part of the back of one page.

My syllabus for the introductory education course I taught was 17 pages.

The field of education—including teacher education—I fear, is mired in bureaucracy because we do not trust ourselves; we do not trust ourselves in the way that the disciplines do in chemistry and English and history right on our campuses all across higher education.

We are our own worst enemies when we persist down the accountability road, demanding standards, rubrics, data charts, and the external review of bureaucratic agencies to whom we abdicate the responsibility of bestowing certification on candidates and accreditation on departments and colleges because we do not trust our field or ourselves.

Recommended

“We Brought It Upon Ourselves”: University-Based Teacher Education and the Emergence of Boot-Camp-Style Routes to Teacher Certification, Daniel Friedrich

Smagorinsky on Authentic Teacher Evaluation

At mid-nineteenth century, public schools were under attack by the Catholic church; Bishop John Hughes “described the public schools as a ‘dragon…devouring the hope of the country as well as religion’” (Jacoby, pp. 257-258). Throughout the twentieth century, the political and public messages were about the same: public education was a failure.

Ironically, the rhetorical math has never added up: U.S. prosperity and international competitiveness depend on world-class public schools + U.S. public schools are failures = the U.S. dominates the world economically and/or “U.S.A. is number 1!”

As the school accountability era began in the early 1980s, the “public school as failure” mantra began to focus on low-performing schools and underachievement by students—as the early wave of accountability focused primarily on schools (including school report cards for the public) and exit exams as well as increased high-stakes testing for students.

The twenty-first century has added to the accountability target a new focus on the “bad” teacher. As a result, teachers; educational historians, scholars, and researchers; and public school advocates have been forced into a corner, reduced to nearly a monolithic reactionary voice of rebuttal.

That position of reaction has drawn fire—charges of inappropriate tone, defense of the status quo, masked self-interest, and a failure to offer alternatives.

That last point is important, especially in the debate over teacher evaluation that has seen a rise in value-added methods (VAM) of teacher evaluation and a resurgence in merit pay policies despite both practices being at least tempered if not refuted by a growing body of research.

Peter Smagorinsky’s “Authentic Teacher Evaluation: A Two-Tiered Proposal for Formative and Summative Assessment” (English Education, 46(2), 165-185) offers an important place to acknowledge that the field of education, in fact, has numerous evidence-based alternatives to the reform agenda and highlight the reasons why those alternatives remain mostly absent from the public debate.

First, I highly recommend reading Smagorinsky’s entire piece, but that raises an important aspect about why evidence-based reform policies coming from educators, scholars, and researchers tend to carry little weight in the political and public debate about schools: high-quality, peer-reviewed scholarly work tends to be inaccessible except to fellow researchers and subscribers to relatively obscure journals.

Thus—as Smagorinsky notes himself in this essay about his increased public work as an academic—I want to touch on some of the most important points offered as an authentic alternative to teacher evaluation below.

Teacher Evaluation, Much More than What We Can Measure

Smagorinsky begins by noting the inherent failure of focusing heavily on measurable teaching and learning, an argument well supported by research but which appears to fall on deaf ears among politicians and the public.

Key in his proposal is refuting a false narrative, notably coming from Eric Hanushek, that teachers reject being held accountable. This is a powerful and important point that must be clearly understood.

The current approach to accountability in education includes holding teachers responsible for external mandates and teaching conditions that they have not created as well as measurable outcomes (student scores on high-stakes standardized tests) that are mostly out of their control (teacher impact on measurable student outcomes is about 10%, dwarfed by the influence of out-of-school factors, between about 60-80%).

As Smagorinsky notes, educators are rejecting that flawed approach to accountability and calling instead for professional accountability, which begins with teacher autonomy and includes holding teachers accountable for only that over which they have control (which is not measurable student outcomes).

If teacher evaluation policies, he explains, focused more on the conditions of teaching and learning—increasing the likelihood that both teachers and students can succeed—and less on punitive practices (such as firing the bottom X% based on VAM rankings, as Hanushek and Bill Gates have proposed), many of the goals for improved teacher quality and student achievement could be met.

Another key shift suggested by Smagorinsky is lessening significantly the amount of high-stakes testing (every 3-5 years, for example) included in teacher evaluations both as a recognition of the inordinate cost associated with testing (we rarely note that fully implemented VAM-like teacher evaluations would require pre-/post-tests of every student in every class taught in order to be fair and consistent among all teachers) and of the validity and reliability concerns that remain for VAM-based evaluations. This is a similar compromise to the one offered by Stephen Krashen, who has argued for not implementing Common Core and the so-called next-generation high-stakes tests, but to use the sampling process already in place with NAEP.

Teaching is a social activity within and for a community, and Smagorinsky envisions teacher evaluation that is more than a number, including a wide range of stakeholders. This point reminds me of the use of the SAT in college admissions. When discussing the weight of SAT scores with a dean of admissions, he pointed out that even when SAT scores are weighted less in admissions formulas, most of the other categories cancel each other out (as they are similar) and SAT, although a lower proportion in the formula, essentially remains the gatekeeping data point.

Any percentage of VAM, then, can prove to be powerful in teacher evaluations that are not aggressively nuanced and multi-faceted, including expanding the input of most if not all stakeholders in the teaching of children.

While much of Smagorinsky’s discussion includes the complex details that should be involved in teacher evaluations (and thus, I recommend reading his essay in its entirety), a few key points can serve to conclude this consideration:

  • Teacher evaluations should be designed as “some form of mediated discussions, with artifacts from teaching serving as the basis of the conversation” (p. 171). Important here is a process that sets aside hierarchy for dialogue, seeks teacher growth instead of ranking and punishing, and builds a consensus on a rich body of evidence (artifacts) instead of reductive metrics.
  • Teacher evaluations should address the entire spectrum of teacher influence, not restricted to classroom and content only. In short, Smagorinsky highlights that being a teacher is more than lessons presented to her/his students.
  • Implicit in Smagorinsky’s discussion, I would add, is that teacher evaluation should not continue the assumption that teacher impact is solely between one teacher and one student. Learning results from the input of many people in a child’s life; teacher evaluations should acknowledge collaboration as well as individual competence and impact.

“I again return,” Smagorinsky notes toward the end, “to the idea that what matters is how well a teacher can justify an instructional approach and relate it to student work”—and I would add, demonstrated student need (p. 182). Teacher quality should not be about teachers fitting a prescribed mold, but about the professional efficacy of a teacher’s practices in the context of the field and students that the teacher teaches.

While I recommend Smagorinsky’s proposal because he emphasizes that “[t]eaching and learning are human pursuits” (and thus, likely unmeasurable in any meaningful way), I also want to stress that this essay is but one piece of evidence that the field of education already knows what to do.

Common Core, VAM-style teacher evaluation, and the entire array of education reform are ultimately misguided because they are commentaries based on flawed assumptions about the field of education.

Despite the education bashing that has occurred in the U.S. for over 100 years, we in fact know what to do—if only politicians, pundits, and bureaucrats could see fit to get out of the way and allow the opportunity to prove it.

Until that happens, we educators must begin to make our case for alternative to misguided reform, and in ways that are accessible to all stakeholders in public schools.

VAMboozled by Empty-Suit Leadership in SC

Rep. Andy Patrick, R-Hilton Head Island (SC), has made two flawed claims recently, one about leadership and another about teacher evaluation (“S.C. lawmaker proposes teacher evaluation plan,” Charleston Post and Courier, December 10, 2013).*

First, and briefly, Patrick asserts that SC needs leadership for superintendent of education, discounting the importance of experience or expertise. As I will address below, Patrick’s lack of experience and expertise is, ironically, evidence that leadership is not enough. In fact, leadership begins with experience and expertise; it doesn’t replace those essential qualities.

Next, and more importantly, Patrick’s and current Superintendent Mick Zais’s pursuit of test-based teacher evaluation reform is deeply flawed and discredited by research on value added methods (VAM) of evaluating teachers.

Endorsing VAM-heavy teacher evaluation joins grade retention, charter schools, and Common Core as a series of policy decisions in SC that are countered by the research base—resulting in a tremendous waste of time and funding that should be better spent for our students and our state.

For example, Edward H. Haertel’s Reliability and validity of inferences about teachers based on student test scores (ETS, 2013) now offers yet another analysis that details how VAM fails, again, as a credible policy initiative. Haertel’s analysis offers the following:

  • First, Haertel addresses the popular and misguided perception that teacher quality is a primary influence on measurable student outcomes. As many researchers have detailed, teachers account for about 10% of student test scores. While teacher quality matters, access to experienced and certified teachers as well as addressing out-of-school factors dwarf narrow measurements of teacher quality.
  • Next, Haertel confronts the myth of the top quintile teachers, outlining three reasons that arguments about those so-called “top” teachers’ impact are exaggerated.
  • Haertel also acknowledges the inherent problems with test scores and what VAM advocates claim they measure—specifically that standardized tests create a “bias against those teachers working with the lowest- performing or the highest performing classes” (p. 8).
  • The next two sections detail the logic behind VAM as well as the statistical assumptions in which VAM is grounded, laying the basis for Haertel’s main assertion about using VAM in high-stakes teacher evaluations.
  • The main section of the report reaches a powerful conclusion that matches the current body of research on VAM:

These 5 conditions would be tough to meet, but regardless of the challenge, if teacher value-added scores cannot be shown to be valid for a given purpose, then they should not be used for that purpose.

So, in conclusion, VAM may have a modest place in teacher evaluation systems, but only as an adjunct to other information, used in a context where teachers and principals have genuine autonomy in their decisions about using and interpreting teacher effectiveness estimates in local contexts. (p. 25)

  • In the last brief section, Haertel outlines a short call for teacher evaluations grounded in three evidence-based “common features”:

First, they attend to what teachers actually do — someone with training looks directly at classroom practice or at records of classroom practice such as teaching portfolios. Second, they are grounded in the substantial research literature, refined over decades of research, that specifies effective teaching practices….Third, because sound teacher evaluation systems examine what teachers actually do in the light of best practices, they provide constructive feedback to enable improvement. (p. 26)

Haertel’s concession that VAM has a “modest” place in teacher evaluation is no ringing endorsement, but it certainly refutes the primary—and expensive—role that VAM is playing in proposals to reform teacher evaluation in SC and across the U.S.

Would SC benefit from focusing on teacher quality—as well as insuring all children have equitable access to experienced and certified teachers? Absolutely.

But current calls by leaders with no experience or expertise in education are failing that possibility by rushing to implement policy that is contradicted by a growing body of research discounting the value of VAM as a key element of teacher evaluation.

SC students, teachers, and schools cannot afford doubling-down on a failed test-based education culture, and certainly, SC cannot afford more leadership without expertise, which is what Representative Patrick is offering.

* Submitted to and unpublished in, so far, Charleston Post and Courier.

Please see VAMboozled web site for research refuting VAM.

VAM Fails Test, Again: The Bizarro World of Education Reform

The great state of South Carolina (and for full effect, you should hear that with “great” and “state” rhyming, sort of, with “pet” because that is how the good ol’ boy patriarchy says it around here) continues down a path all too familiar across the U.S.: adopt any and all education reform policies that other states are rushing to implement, even (and maybe especially) when research fails to support the practices.

I have catalogued the inexcusable political and public support in SC for retaining third graders based on high-stakes testing scores—a policy directly linked to Read, Florida.

And despite equally ample evidence to the contrary about basing teacher evaluations on value added methods (VAM), also a corrosive policy in Florida, Charleston, SC is moving forward with BRIDGE, characterized by Peter Smyth as A BRIDGE to I Have No Clue Where.

Public policy implementing grade retention, VAM, and lingering commitments to merit pay—just to name a few—continues to thrive in SC and across the U.S., seemingly as a bold-faced snub of the idealistic (and increasingly Orwellian) call in No Child Left Behind that education policy must be “scientifically based.”

Education Reform in Bizarro World

In the DC Universe, Superman has often encountered Bizarro World, Htrae. Education reform is no less bizarre with the political and public mania for policies that have been and continue to be refuted by large bodies of research.

For example, Edward H. Haertel’s Reliability and validity of inferences about teachers based on student test scores (ETS, 2013) now offers yet another analysis that details how VAM fails, again, as a credible policy initiative—with a few caveats*.

Briefly, the analysis by Haertel offers the following:

  • First, Haertel addresses the popular and misguided perception that teacher quality is a primary influence on measurable student outcomes. As many researchers have detailed, teachers account for about 10% of student test scores, as shown in this graphic (see p. 5):

graphic teach influence

  • Next, Haertel confronts the myth of the top quintile teachers (pp. 6-7*), outlining three reasons that arguments about those so-called “top” teachers’ impact are exaggerated.
  • Haertel also acknowledges the inherent problems with test scores and what VAM advocates claim they measure—specifically that standardized tests create a “bias against those teachers working with the lowest- performing or the highest performing classes” (p. 8).
  • The next two sections detail the logic behind VAM as well as the statistical assumptions in which VAM is grounded (pp. 9-13), laying the basis for Haertel’s main assertion about using VAM in high-stakes teacher evaluations.
  • The main section of the report, An Interpretive argument for value-added model (VAM)
    teacher effectiveness estimates (pp. 14-25), reaches a powerful conclusion that matches the current body of research on VAM:

These 5 conditions would be tough to meet, but regardless of the challenge, if teacher value-added scores cannot be shown to be valid for a given purpose, then they should not be used for that purpose.

So, in conclusion, VAMs may have a modest place in teacher evaluation systems, but only as an adjunct to other information, used in a context where teachers and principals have genuine autonomy in their decisions about using and interpreting teacher effectiveness estimates in local contexts. (p. 25)

  • In the last brief section, Haertel outlines a short call for teacher evaluations grounded in three evidence-based “common features”:

First, they attend to what teachers actually do — someone with training looks directly at classroom practice or at records of classroom practice such as teaching portfolios. Second, they are grounded in the substantial research literature, refined over decades of research, that specifies effective teaching practices….Third, because sound teacher evaluation systems examine what teachers actually do in the light of best practices, they provide constructive feedback to enable improvement. (p. 26)

Haertel’s concession that VAM has a “modest” place in teacher evaluation is no ringing endorsement, but it certainly refutes the primary—and expensive—role that VAM is playing in the rush to reform teacher evaluation in SC and across the U.S.

In the irony of ironies that can occur only in the Bizzaro World of education reform, each time VAM is tested, it fails, and each time it fails, more states line up to implement it.

* Haertel offers a more than generous analysis of the Chetty, Friedman, and Rockoff (2011) claim that teacher impact can be extrapolated into adult earning for students. I urge readers to examine Bruce Baker‘s and Matthew Di Carlo‘s more nuanced and cautious analyses of those claims.

What We Know Now (and How It Doesn’t Matter)

Randy Olson’s Flock of Dodos (2006) explores the evolution and Intelligent Design (ID) debate that represents the newest attack on teaching evolution in U.S. public schools. The documentary is engaging, enlightening, and nearly too fair considering Olson admits upfront that he stands with scientists who support evolution as credible science and reject ID as something outside the realm of science.

Olson’s film, however, offers a powerful message that rises above the evolution debate. Particularly in the scenes depicting scientists discussing (during a poker game) why evolution remains a target of political and public interests, the documentary shows that evidence-based expertise often fails against clear and compelling messages (such as “teach the controversy”)—even when those clear and compelling messages are inaccurate.

In other words, ID advocacy has often won in the courts of political and public opinion despite having no credibility within the discipline it claims to inform—evolutionary biology.

With that sobering reality in mind, please identify what XYZ represents in the following statement about “What We Know Now”:

Is there a bottom line to all of this? If there is one, it would appear to be this: Despite media coverage, which has been exceedingly selective and misrepresentative, and despite the anecdotal meanderings of politicians, community members, educators, board members, parents, and students, XYZ have not been effective in achieving the outcomes they were assumed to aid….

This analysis is addressing school uniform policies, conducted by sociologist David L. Brunsma who examined evidence on school uniform effectiveness (did school uniform policies achieve stated goals of those policies) “from a variety of data gathered during eight years of rigorous research into this issue.”

This comprehensive analysis of research from Brunsma replicates the message in Flock of Dodos—political, public, and media messaging continues to trump evidence in the education reform debate. Making that reality more troubling is that a central element of No Child Left Behind was a call to usher in an era of scientifically based education research. As Sasha Zucker notes in a 2004 policy report for Pearson, “A significant aspect of the No Child Left Behind Act of 2001 (NCLB) is the use of the phrase ‘scientifically based research’ well over 100 times throughout the text of the law.”

Brunsma’s conclusion about school uniform policies, I regret to note, is not an outlier in education reform but a typical representation of education reform policy. Let’s consider what we know now about the major education reform agendas currently impacting out schools:

Well into the second decade of the twenty-first century, then, education reform continues a failed tradition of honoring messaging over evidence. Neither the claims made about educational failures, nor the solutions for education reform policy today are supported by large bodies of compelling research.

As the fate of NCLB continues to be debated, the evidence shows not only that NCLB has failed its stated goals, but also that politicians, the media, and the public have failed to embrace the one element of the legislation that held the most promise—scientifically based research—suggesting that dodos may in fact not be extinct.

* Santelices, M. V., & Wilson, M. (2010, Spring). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Educational Review, 80(1), 106-133.; Spelke, E. S. (2005, December). Sex differences in intrinsic aptitude for mathematics and science? American Psychologist, 60(9), 950-958; See page 4 for 2012 SAT data: http://media.collegeboard.com/digitalServices/pdf/research/TotalGroup-2012.pdf

Conservative Leadership Poor Stewardship of Public Funds

In South Carolina and across the U.S., conservative leadership of education reform has failed to fulfill a foundational commitment to traditional values, good stewardship of public funds. [1]

The evidence of that failed stewardship is best exposed in commitments to three education reform policies: Adopting and implementing Common Core State Standards (CCSS), designing and implementing new tests based on CCSS, and proposing and field-testing revised teacher evaluations based on value-added models (VAM).

SC committed a tremendous amount of time and public funding to the accountability movement thirty years ago as one of the first states to implement state standards and high-stakes testing. After three decades of accountability, SC, like every other state in the union, has declared education still lacking and thus once again proposes a new round of education reform primarily focusing on, yet again, accountability, standards, and high-stakes testing.

Several aspects of committing to CCSS, new high-stakes tests, and teacher evaluation reform that are almost absent from the political and public debate are needs and cost/benefit analyses of these policies.

More of the Same Failed Policies?

If thirty years of accountability has failed, why is more of the same the next course of reform? If thirty years of accountability has failed, shouldn’t SC and other states first clearly establish what the problems and goals of education are before committing to any policies aimed at solving those problems or meeting those goals?

Neither of these questions have been adequately addressed, yet conservative political leadership is racing to commit a tremendous amount of public funding and public workers’ time to CCSS, an increase in high-stakes testing never experienced by any school system, and teacher evaluations proposals based on discredited test-based metrics.

Just as private corporations have reaped the rewards of tax dollars in SC during the multiple revisions of our accountability system, moving through at least three versions of tests and a maze of reformed state standards, the only guaranteed outcomes of commitments to CCSS, new tests, and reformed teacher evaluations are profits for textbook companies, test designers, and private consultants—all of whom have already begun cashing in on branding materials with CCSS and the yet-to-be designed high-stakes tests that will eventually be implemented twice a year in every class taught in the state.

SC as a state and as an education system is burdened by one undeniable major problem, inequity of opportunities in society and in schools spurred by poverty.

Numerous studies in recent years have shown that schools across the U.S. tend to reflect and perpetuate inequity; thus, children born into impoverished homes and communities are disproportionately attending schools struggling against and mirroring the consequences of poverty.

Commitments in SC to CCSS, new high-stakes tests, and reforming teacher evaluations based in large part on those new tests are at their core poor stewardship of public funding in a state that has many more pressing issues needing the support of state government.

A further problem with conservative leadership endorsing these education reforms is that much of the motivation for CCSS, new test, and reforming teacher evaluations comes from funding mandates by the federal government.

Misguided education reform is not only a blow to conservative economics but also a snub to traditional trust in local government over federal control.

Recently, as well, a special issue on VAM from Education Policy Analysis Archives (EPAA) includes two analyses that should give policy makers in SC and all states key financial reasons to pause if not halt commitments to education reform based on student test scores—the potential for legal action from a variety of stakeholders in education.

Baker, Oluwole, and Green explain: “Overly prescriptive, rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights. This is likely despite the fact that much of the policy impetus behind these new evaluation systems is the reduction of legal hassles involved in terminating ineffective teachers.”

Further, Pullin warns: “For public policymakers, there are strong reasons to suggest that high-stakes implementation of VAM is, at best, premature and, as a result, the potential for successful legal challenge to its use is high. The use of VAM as a policy tool for meaningful education improvement has considerable limitations, whether or not some judges might consider it legally defensible.”

Do schools across SC need education reform? Yes, just as social policy in the state needs to address poverty as a key mechanism for supporting those schools once they are reformed.

But in a state driven by traditional values and conservative political leadership, current commitments to CCSS, new high-stakes tests, and reforming teacher evaluations are neither educationally sound nor conservative.

[1] Expanded version of Op-Ed published in The State (Columbia, SC), March 8, 2013: “Conservatives poor stewards of education funds”

Daily Kos: Misreading Teacher Evaluation and Retention

Daily Kos: Misreading Teacher Evaluation and Retention

The League of Women Voters of South Carolina has released “How to Evaluate and Retain Effective Teachers” (2011-2013), but this report misreads the evidence on teacher evaluation and thus distracts high-poverty states from needed educational reform. [1]

A review of the report shows it does not establishing a clear problem with teacher quality in SC and misrepresents the current body of research on teacher evaluation, particularly value added methods (VAM) of evaluation.

As a high-poverty and racially diverse state, SC is similar to many other states facing educational hurdles, but those hurdles have less to do with identifying and ranking teacher quality and more to do with the inequitable distribution of teachers. Children of color, children in poverty, English language learners, and special needs students are taught disproportionately by inexperienced and un-/under-certified teachers. SC and other high-poverty states would do well to address teacher assignment and teaching conditions before experimenting with new teacher evaluation systems.

Ultimately, this report misreads and misrepresents the current understanding of how to evaluate and determine teacher quality—specifically through test-based methods.

continue reading at Daily Kos

NFL again a Harbinger for Failed Education Reform?

During the impending NFL strike in 2011—the act of a union—I drew a comparison between how the public in the U.S. responds to unionization in different contexts:

“I am speaking about the possible NFL strike that hangs over this coming Super Bowl weekend: a struggle between billionaires and millionaires, which, indirectly, shines an important light on the rise of teacher and teacher union-bashing in the US. Adam Bessie, in Truthout, identifies how the myth of the bad teacher has evolved.”

Once again, the NFL is facing a situation that I believe and even hope is another harbinger of how education reform can be halted: A suit filed by the family of Junior Seau:

“The family said the league not only ‘propagated the false myth that collisions of all kinds, including brutal and ferocious collisions, many of which lead to short-term and long-term neurological damage to players, are an acceptable, desired and natural consequence of the game,’ but also that ‘the N.F.L. failed to disseminate to then-current and former N.F.L. players health information it possessed’ about the risks associated with brain trauma.”

This law suit has prompted a considerable amount of debate concerning whether or not the NFL as we currently know it could be dramatically reconfigured under the pressure of more law suits. In other words, the inherent but often ignored or concealed dangers of football are now being exposed by legal action, in much the same way as the tobacco industry was unmasked and thus the entire culture of smoking has radically changed in the last couple decades.

With the release of the Education Policy Analysis Archives (EPAA) Special Issue on “Value-Added Model (VAM) Research for Educational Policy,” a similar question should now be raised about the future of implementing high-stakes accountability policies that focus on teacher evaluation and retention through VAM-style metrics.

“High-Stakes Implementation of VAM,…Premature”

Two articles in the special issue from EPAA examines the validity and reliability of VAM-based teacher evaluation in high-stakes settings and then places these policies in the context of legal ramifications faced by districts and states for those policies.

“The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era” (Baker, Oluwole, & Green, 2013) identifies the current trend: “Spurred by the Race-to-the-Top program championed by the Obama administration and a changing political climate in favor of holding teachers accountable for the performance of their students, many states revamped their tenure laws and passed additional legislation designed to tie student performance to teacher evaluations” (p. 3). Because of the political and public momentum behind reforming teacher evaluation, Baker, Oluwole, and Green seek “to bring some urgency to the need to re-examine the current legislative models that put teachers at great risk of unfair evaluation, removal of tenure, and ultimately wrongful dismissal” (p. 5).

While Baker, Oluwole, and Green offer a detailed and evidence-based examination of the VAM-based and student growth model approaches to high-stakes teacher accountability, they ultimately place the weaknesses of reform policies in the context of potential challenges from teachers who believe they have been wrongfully evaluated or dismissed:

“In this section, we address the various legal challenges that might be brought by teachers dismissed under the rigid statutory structures outlined previously in this article. We also address how arguments on behalf of teachers might be framed differently in a context where value-added measures are used versus one where student growth percentiles are used. Where value-added measures are used, we suspect that teachers will have to show that while those measures were intended to attribute student achievement to their effectiveness, the measures failed to do so in a number of ways. That is, where value-added measures are used to assign effectiveness ratings, we suspect that the validity and reliability, as well as understandability of those measures would need to be deliberated at trial. However, where student growth percentiles are used, we would argue that the measures on their face are simply not designed for attributing responsibility to the teacher, and thus making such a leap would necessarily constitute a wrongful judgment. That is, one would not necessarily even have to vet the SGP measures for reliability or validity via any statistical analysis, because on their face they are invalid for this purpose.”

The analysis ultimately discredits both the use of narrow metrics to determine teacher quality and the high-stakes policies being implemented using those metrics, concluding with the ironic consequences of these policies: “Overly prescriptive, rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights. This is likely despite the fact that much of the policy impetus behind these new evaluation systems is the reduction of legal hassles involved in terminating ineffective teachers” (pp. 18-19).

In “Legal Issues in the Use of Student Test Scores and Value-added Models (VAM) to Determine Educational Quality” (Pullin, 2013), the rapid increase of VAM-based accountability is further examined in the context of “a wide array of potential legal issues [that] could arise from the implementation of these programs” (p. 2).

Pullin notes the motivation for reforming teacher evaluation:

“VAM initiatives are consistent with a highly publicized press from the business community and many politicians to make government services more like private business, data-driven to measure productivity and accountability (Kupermintz, 2003). VAM approaches are in part a response to concerns that the current system of selecting and compensating teachers based their education and credentials is insufficient for insuring teacher quality (Corcoran, 2011; Gordon, Kane & Staiger, 2006; Hanushek & Rivkin, 2012; Harris, 2011). There have been increasing expressions of concern that teacher evaluation practices are not robust and do not improve practice (Kennedy, 2010). In the contemporary public policy context, much of the support for the use of student test scores for educator evaluation comes from a concern that the current system for evaluation is ineffective and that the current legal protections for teachers are too cumbersome for schools seeking to terminate teachers (Harris, 2009, 2011).”

While a business model for addressing quality control of a work force may seem efficient, Pullin highlights that legal ramifications are likely with these new models.

Pullin’s analysis offers a detailed and useful examination of previous court cases involving the use of test scores to evaluate educators, including recent cases involving VAM, concluding that the picture is not clear on how the courts may rule in the future, but that a pattern exists of “heavy judicial deference to state and local education policymakers and the allure of using test scores to make decisions about education quality” (p. 5).

Further, Pullin notes “there are differences of perspective among social scientists about VAM and the defensibility of using it to make high-stakes decisions about educators,” further complicating the concerns of legal action (p. 9).

While raising many other complications, Pullin also notes that students and parents may enter legal battles using VAM metrics “to substantiate their own legal claims that schools are not meeting their obligations to provide education” (p. 14).

Pullin concludes with a sobering look at teacher quality reform built on VAM and implemented in high-stakes environments:

“In the broad contemporary public policy context for education reform, the desire for accountability and transparency in government, coupled with heavily financed criticisms of public school teachers and their unions, may mean that VAM initiatives will prevail. The concerns of education researchers about VAM, coupled with legal obligations for the validity and reliability of education and evaluation programs should require judges and education policymakers to take a closer look for future decision-making. At the same time, the social science research community should be generating substantial new and persuasive evidence about VAM and the validity and reliability of all of its potential uses. For public policymakers, there are strong reasons to suggest that high-stakes implementation of VAM is, at best, premature and, as a result, the potential for successful legal challenge to its use is high. The use of VAM as a policy tool for meaningful education improvement has considerable limitations, whether or not some judges might consider it legally defensible.” (p. 17)

Like the NFL, federal and state governments may soon be compelled to reform the reform movement under the threat of legal action from a variety of stakeholders since the science of teacher evaluation remains far behind the curve of implementation, particularly when teacher evaluation is high-stakes and based on VAM and other metrics linked to student test scores.

The special issue from EPAA is yet another call for political leadership to pause if not end wide-scale teacher evaluation and retention models that pose legal, statistical, and funding challenges that those leaders appear unwilling to acknowledge or address.