VAMboozled by Empty-Suit Leadership in SC

Rep. Andy Patrick, R-Hilton Head Island (SC), has made two flawed claims recently, one about leadership and another about teacher evaluation (“S.C. lawmaker proposes teacher evaluation plan,” Charleston Post and Courier, December 10, 2013).*

First, and briefly, Patrick asserts that SC needs leadership for superintendent of education, discounting the importance of experience or expertise. As I will address below, Patrick’s lack of experience and expertise is, ironically, evidence that leadership is not enough. In fact, leadership begins with experience and expertise; it doesn’t replace those essential qualities.

Next, and more importantly, Patrick’s and current Superintendent Mick Zais’s pursuit of test-based teacher evaluation reform is deeply flawed and discredited by research on value added methods (VAM) of evaluating teachers.

Endorsing VAM-heavy teacher evaluation joins grade retention, charter schools, and Common Core as a series of policy decisions in SC that are countered by the research base—resulting in a tremendous waste of time and funding that should be better spent for our students and our state.

For example, Edward H. Haertel’s Reliability and validity of inferences about teachers based on student test scores (ETS, 2013) now offers yet another analysis that details how VAM fails, again, as a credible policy initiative. Haertel’s analysis offers the following:

  • First, Haertel addresses the popular and misguided perception that teacher quality is a primary influence on measurable student outcomes. As many researchers have detailed, teachers account for about 10% of student test scores. While teacher quality matters, access to experienced and certified teachers as well as addressing out-of-school factors dwarf narrow measurements of teacher quality.
  • Next, Haertel confronts the myth of the top quintile teachers, outlining three reasons that arguments about those so-called “top” teachers’ impact are exaggerated.
  • Haertel also acknowledges the inherent problems with test scores and what VAM advocates claim they measure—specifically that standardized tests create a “bias against those teachers working with the lowest- performing or the highest performing classes” (p. 8).
  • The next two sections detail the logic behind VAM as well as the statistical assumptions in which VAM is grounded, laying the basis for Haertel’s main assertion about using VAM in high-stakes teacher evaluations.
  • The main section of the report reaches a powerful conclusion that matches the current body of research on VAM:

These 5 conditions would be tough to meet, but regardless of the challenge, if teacher value-added scores cannot be shown to be valid for a given purpose, then they should not be used for that purpose.

So, in conclusion, VAM may have a modest place in teacher evaluation systems, but only as an adjunct to other information, used in a context where teachers and principals have genuine autonomy in their decisions about using and interpreting teacher effectiveness estimates in local contexts. (p. 25)

  • In the last brief section, Haertel outlines a short call for teacher evaluations grounded in three evidence-based “common features”:

First, they attend to what teachers actually do — someone with training looks directly at classroom practice or at records of classroom practice such as teaching portfolios. Second, they are grounded in the substantial research literature, refined over decades of research, that specifies effective teaching practices….Third, because sound teacher evaluation systems examine what teachers actually do in the light of best practices, they provide constructive feedback to enable improvement. (p. 26)

Haertel’s concession that VAM has a “modest” place in teacher evaluation is no ringing endorsement, but it certainly refutes the primary—and expensive—role that VAM is playing in proposals to reform teacher evaluation in SC and across the U.S.

Would SC benefit from focusing on teacher quality—as well as insuring all children have equitable access to experienced and certified teachers? Absolutely.

But current calls by leaders with no experience or expertise in education are failing that possibility by rushing to implement policy that is contradicted by a growing body of research discounting the value of VAM as a key element of teacher evaluation.

SC students, teachers, and schools cannot afford doubling-down on a failed test-based education culture, and certainly, SC cannot afford more leadership without expertise, which is what Representative Patrick is offering.

* Submitted to and unpublished in, so far, Charleston Post and Courier.

Please see VAMboozled web site for research refuting VAM.

About these ads

4 thoughts on “VAMboozled by Empty-Suit Leadership in SC

  1. Paul,
    Haertel’s summary statement that you post is preceded by the section below. And even all of that does not address other hurdles (random assignment, diverse student population, etc.) that VAM has not cleared to make it acceptable (AT ANY PERCENTAGE) for high stakes purposes. A survey of Haertel’s and other researchers’ catalog of shortcomings makes up Part 3 of TMoE. All these reasons led to the NRC waving Duncan off of VAM in late 2009, which, of course, he summarily dismissed.

    I am sensing an annoying meme emerging that a little bit of VAM for high stakes purposes is okay, which is a very dangerous and incorrect assumption based on the preponderance of evidence to the contrary. It’s almost as dangerous as the back room deals being discussed by the corporate unions and the Gates Foundation to usher in the Common Core in trade for getting rid of teacher eval based on VAM.

    Haertel’s text preceding the summary statement you posted:

    “Are teacher-level VAM scores good for anything, then? Yes, absolutely. But, for some purposes, they must be used with considerable caution. To take perhaps the easiest case first, for researchers comparing large groups of teachers to investigate the effects of teacher training approaches or educational policies, or simply to investigate the size and importance of long-term teacher effects, it is clear that value-added scores are far superior to unadjusted end-of-year student test scores.18 Averaging value-added scores across many teachers will damp down the random noise in these estimates and could also help with some of the systematic biases, although that is not guaranteed. So, for research purposes, VAM estimates definitely have a place. This is also one of the safest applications of VAM scores because the policy researchers applying these models are likely to have the training and expertise to respect their limitations.
    A considerably riskier use, but one I would cautiously endorse, would be providing individual teachers’ VAM estimates to the teachers themselves and to their principals, provided all 5 of the following critically important conditions are met:
    • Scores based on sound, appropriate student tests
    • Comparisons limited to homogeneous teacher groups
    • No fixed weight — flexibility to interpret VAM scores in context for each individual case
    • Users well trained to interpret scores
    Clear and accurate information about uncertainty (e.g., margin of error) 
First, the scores must be based on sound and appropriate student achievement tests, aligned to the content teachers are expected to cover, providing valid measurements for the full range of student achievement levels, and scaled appropriately. This may sound obvious, but it is in fact a very strong limitation on the applicability of these models. One of the most troubling aspects of some current reform proposals is the insistence on universal application of value-added to all teachers in a district or state. For most teachers, appropriate test data are not available, period. They teach children so young that there are no prior year scores, or they teach untested subjects, or they teach high school courses for which there are no pretest scores that it makes any sense to use. 
Second, comparisons should be limited to fairly homogeneous groups of teachers. Rankings that mix teachers from different grade levels or teaching in schools with very different demographics place severe demands on statistical model assumptions, and the effects of violations of these assumptions are often not well under- stood. Conservative, local comparisons restricted to single subject areas and grade levels within homogeneous districts are much safer.
    Third, there should be no fixed weight attached to the scores in reaching any consequential decisions. Principals and teachers must have the latitude to set aside an individual’s score entirely — to ignore it completely — if they have specific information about the local context that could plausibly render that score invalid.
    Fourth, anyone using teacher VAM scores in consequential decisions must be well trained to interpret the scores appropriately.
    And, finally, score reports must be accompanied by clear and comprehensible information about the scores’ instability and imprecision, and the range of factors that could render the scores invalid.
    These 5 conditions would be tough to meet, but regardless of the challenge, if teacher value-added scores cannot be shown to be valid for a given purpose, then they should not be used for that purpose.”

  2. Pingback: 2014 Educators’ Agenda – @ THE CHALK FACE
  3. Pingback: Shooting Ourselves in the Foot – @ THE CHALK FACE

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s