Measuring Proficient Teachers Codifies Bad Teaching

Maja Wilson and Alfie Kohn have found themselves in a problematic minority during the accountability era dedicated to standards, high-stakes testing, and the ever-present rubric.

Rubrics, they argue, ultimately fail complex human behaviors such as writing. While rubrics facilitate statistical aspects of measuring human behaviors (such as teaching and learning), by doing so, they also tend to erode the quality of the very behaviors being measured.

As a writing teacher, I can confirm Wilson’s and Kohn’s critiques that student writing conforming to a rubric and thus deemed “proficient” or “excellent” can be and often is quite bad writing. Rubric-based labels such as “proficient” reflect compliance to the rubric, not writing quality.

Wilson, in fact, has demonstrated this by revising a professional and beautiful piece of writing by Sandra Cisneros so that is conforms to a computer-graded system’s criteria for high-quality writing. The result was more than disturbing with the revised work substantially worse but better correlated with what the Educational Testing Service (ETS) has deemed “good.”

While Wilson’s experiment focuses on computer-graded writing, the basis of that is having a generic rubric to determine writing quality, and thus, here we begin to investigate why rubric-driven evaluation of complex human behavior always fails:

  • Rubrics reduce the unpredictable to the prescribed.
  • To be practical, rubrics often attempt to be generic enough to cover huge categories—such as writing and teaching—and thus failing the reality that poetry writing is significantly distinct from journalism or that teaching second grade is significantly distinct from teaching high school physics.
  • When rubrics use terminology that is broad enough to address those varieties, they are useless due to being too vague; when rubrics use terminology that is specific, they are useless because they are unduly prescriptive. If the learning objective is jumping rope, if proficiency is “students jump well,” we have no idea what “well” means, and if proficiency is “students jump 10 times without missing,” that 10 becomes all that matters. In other words, in both cases, complying to the rubric ultimately supersedes the actual jumping rope.
  • Rubrics replace substantive feedback conducive to learning, and in fact, stagnates learning and reduces all assessment as summative.
  • As with high-stakes testing, high-stakes rubrics connected to course grades and/or as part of state accountability systems carry the weight of authority—shifting that authority from teachers and students to the rubric itself and the bureaucracy behind it.

So this brings me back to South Carolina teacher evaluation rubric, adapted from the National Institute for Excellence in Teaching (NIET).

SC’s version of the NIET rubric, as I discussed, is marred by being unmanageable due to its length and inadequate due to the inordinate amount of terminology that is too vague (and again, if we address that vagueness, we still have a flawed instrument that is all prescription).

While going through a first session of training in the rubric, I witnessed the greatest problem with using generic rubrics to determine teacher quality: a very bad literacy lesson was pronounced at the high end of “proficient” by how it conformed to the rubric, but the lesson was in fact terribly uninspired, overly teacher-centered, and reductive—as well, it likely eroded significantly the students’ passion for and interest in reading and literacy.

Adopting and implementing a new teacher quality rubric, however, have been committed primarily to training those who will evaluate teachers so that the assessors are familiar with the rubric and the endorsed process; and then, above all else, a central goal is to produce inter-rater reliability with a rubric that NIET and others have already deemed valid.

In other words, this is a statistical enterprise—not an adventure in teaching and learning.

Lost in the technocratic orgy about validity, reliability, and the all-things scientific, we have made the mistake confronted by John Dewey:

What avail is it to win prescribed amounts of information about geography and history, to win ability to read and write, if in the process the individual loses his own soul: loses his appreciation of things worth while, of the values to which these things are relative; if he loses desire to apply what he has learned and, above all, loses the ability to extract meaning from his future experiences as they occur? (Experience and Education, p. 49)

The irony here, of course, is that Dewey is one of the seminal voices for education being scientific; however, I cannot imagine his expecting this reductive outcome.

All aspects of teaching and learning are poisoned by our misguided pursuit of a very narrow version of “scientific” that has been subsumed by the bureaucratic and turned into pseudo-science.

What avail is it to label a teacher proficient, if in the process the teaching is terribly uninspired, overly teacher-centered, and reductive, if in the process the students are rendered lifeless and uninspired as well?

5 thoughts on “Measuring Proficient Teachers Codifies Bad Teaching”

  1. When computers analyze writing — and computer programs/algorithms are basically a rubric — this is what those algorithms/rubrics have to say about Lincoln’s Gettysburg Address and Shakespeare.

    “Comptuers never hummed during the days of Shakespeare or Lincoln, but that doesn`t stop them from critiquing great and not-so-great literature of the past and present.” …

    “The computer never let the reputations of Crane, John Steinbeck or William Faulkner intimidate it.

    “After scanning excerpts, the computer said that Crane`s paragraphs were too short in his novel, Red Badge of Courage. The computer did not like Steinbeck`s spelling in his Grapes of Wrath, pointing out dialogue words such as “wanta“ and “oncet.“ Faulkner went word-crazy in his sentences in Mosquitoes, including one sentence that stretched 56 words.”

    Warning, this piece about computers analyzing and correcting writing is more positive for computers than negative, and writers must learn that they should always have the final say over a computer program and/or rubric or any critic be it human or computer program. The key for writers is to learn the rules of writing before breaking them whenever the writers feels like the wrong method, according to a computer, is the best choice.

    http://finance.yahoo.com/news/give-poor-money-directly-and-they-dont-spend-it-on-alcohol-and-cigarettes-135858208.html

  2. The exercises you and Lofthouse use as examples would be useful in the learning community for both students and teachers. In a collegial atmosphere, much can be learned as classic writing and thinking is analyzed via rubrics administered by data-hungry administrators and hungry education databases. The weaknesses of “standards-based” learning become apparent immediately. It becomes clear what barriers lay in the way of learning. That can become kind of a bass-ackward learning tool, perhaps.

  3. Good luck with convincing the powers that be in SC that they are going in the wrong direction and will do more harm than good.

Leave a comment