Category: Statistics

More Thoughts on Feedback, Grades, and Late Work

My good friend and stellar colleague, Ken Lindblom, posted Should Students’ Grades Be Lowered for Lateness?, spurring a series of Tweets about grading late work.

Ken’s thoughtful post focuses on these foundational ideas:

As an educator, I try to base my decisions on a principle of authenticity. In other words, I try to make my decisions more on real-world norms than traditional school norms. I try to ensure that I am preparing students for the world beyond school, not for school. As a result, I try to make sure that the ways in which I assess students’ work is similar to the ways in which they would be assessed in a professional situation.

There are times when a professional can absolutely not be late: grant applications, proposals for conferences/speaking, . . . I’m not sure I can come up with a third example to make a series.

But adults can be late with almost anything else: publication deadlines, job evaluations, doctor’s appointments, taxes–even most bills have a grace period.

Here I want to tease out a few ideas related to feedback on student work (artifacts of learning), grades, and late work.

Like Ken’s concern for authenticity, I tend to work from a personal and professional aversion to hypocrisy based on 18 years teaching English in a rural South Carolina public high school and then 14-plus years in a selective liberal arts university, also in SC.

I have been practicing and refining de-grading and de-testing practices for over thirty years. Let me emphasize, since I have been challenged before, I have implemented—and thus currently advocate for—de-grading and de-testing in many school contexts, including public schools (not just at the university level).

So my path to rejecting grades and tests has many stages and elements. First, I had to confront that calculating grades bound only to averages often distorts grades unfairly for students. Mean, median, and mode are all credible ways to analyze data, and among them, in formal schooling, the mean (average) is both the norm and often the weakest.

I show students this simple example; a series of grades: 10, 10, 85, 85, 85, 85, 85, 85, 100, 100 = 730.

The average is 73, which most teachers would assign, but the mode is 85, and if we note these grades are sequential and cumulative (10 as the first grade in terms of time, and 100 the last grade), a legitimate grade assignment would be the 100.

In other words, using the same data, a teacher could assign 73, 85, or 100 to this student, and all can be justified statistically.

My conclusion has been this greatly challenges the value of assigning grades because those who control the rules, control reality.

Thus, I do not assign grades to any student artifacts of learning (and I do not give traditional tests). Instead I offer feedback that supports students as they revise and resubmit those artifacts.

However, I cannot refuse to assign students grades for courses. Therefore, another distinction I have come to appreciate is the difference between grading an assignment and determining a grade for a grading period or course.

Therein lies my approach to late work, but first, let’s consider adult hypocrisy.

In my 30-plus years as an educator at nearly every level possible, I witness daily teachers and professors who fail to meet deadlines (regularly); talk, do other things (grade papers), stare at their computers/smart phones, etc., during meetings; and behave in a number of ways that they do not tolerate by students in their classes, behaviors that negatively impact students grades.

I also drive daily with adult motorists who exceed the speed limit without any punishment—as most of us have come to realize a grace zone of staying less than ten mph over that limit. In other words, the real world of rules is much fuzzier than the rules of formal schooling.

These are the behaviors I see when I am confronted with student late work.

About late work, then, I have some clear policies. First, I would never change a grade assigned to an artifact of learning that distorts the actual quality of that artifact. A “B” essay is a “B” essay regardless of when it is submitted.

As an educator, my primary concern is student learning, and I suffer no delusions that when that happens is more important than if it happens. I also ascribed to Rick Wormeli’s dictum that fair isn’t always equal; thus, I do not allow very narrow expectations that I treat all students exactly the same override that I am there to serve each student as well as all students.

Next, I always record “lateness” and then consider that when I assign a grade for a grading period or course. If a student has one or two assignments late (clearly an outlier), I may ignore that when determining the grading period/course grade, but if there is a pattern of lateness, then the grading period/course grade must reflect this.

In other words, I believe we must separate artifact quality (the basis of grading period or course grades) from grading period/course grades.

Feedback and grades on artifacts of learning send students clear messages about what they produce (their learning), and then grading period/course grades send a message about the totality of their accomplishments as students.

So if we return to Ken’s context, we can imagine a manager telling a habitually late worker: “Your work here is excellent, but if you aren’t here on time, we will have to let you go.”

Especially in the recent thirty-plus years of standards, educators have fallen prey to standardization, and as a result, we have too often abdicated our professional autonomy and allowed technical norms to supplant our much more important goals and obligations, the human dignity and learning of each child assigned to our care.

And because most people have greater regard for medical doctors than teachers (sigh), I’ll end with an example my major professor offered in my doctoral program.

A patient is admitted to the hospital running a dangerously high temperature. After several days, during all of which the nurses record that patient’s temperature hourly, the doctor comes in, adds those temperatures, calculates the average, and refuses to release the patient, although the current temperature is 98.6.

Right, no medical doctor would allow the norm of averages to override her/his medical authority. And neither should educators.

See Also

Missing Assignments–and the Real World, Nancy Flanagan

The Perils of Late Work and How to Make It Count, Starr Stackstein

What Football Reveals for Education about False Allure of Quantification

If you want to understand the inherent complexity of professional football, you may want to start with a person with a long and rich career in the sport. For example, consider “Super Bowl-winning former Ravens coach Brian Billick” responding to the rise of metrics and statistics in the NFL:

“One of the most common questions I get is, Can you do Moneyball, for lack of a better term, in the NFL? And the answer is, No, you can’t,” Billick said. “You can’t quantify the game of football the way you do baseball. It’s not a statistical game. The parameters of the game, the number of bodies and what they’re doing in conjunction with one another.”

The collaborative and human (although not humane) elements of football, it appears, render the power of statistics less predictive—and less useful—in the sport than in baseball.

One lesson, then, seems to be that statistics are not universally valid and predictive, particularly in contexts that are highly complex.

I am reminded of the post-Katrina analysis of the pre-landfall models for the massive hurricane. That image of hundreds of models was a nightmare of confusion, lending little in valuable predictive information for anyone.

The post-Katrina data on just what did occur, however, were fascinating and powerful.

Since the U.S. cares more about the NFL than public education, Billick’s skepticism and warnings are likely to be better heeded than decades of similar warnings from teachers about the rise of measurable data (mostly high-stakes test scores) in evaluating students, teachers, and schools.

In education, the tug-of-war continues, and I fear, those of us siding wth Billick in the context of education are not fairing well.

See this report advocating more metrics in teacher quality pursuits, Smart, Skilled, and Striving: Transforming and Elevating the Teaching Profession, and then a review, mostly discrediting the report as the abstract notes:

This report from the Center for American Progress offers 10 recommendations for improving the public perceptions of and experiences of classroom teachers. While elements of these recommendations would likely be beneficial, they also include policy changes that would increase surveillance of teachers, reduce teachers’ job security, evaluate teachers by students’ test scores, and create merit pay systems that would likely have the opposite effect. For evidence, the report relies too heavily on popular rhetoric, sound bites, opinion articles, and advocacy publications to advance a policy agenda that in many ways could do further harm to the teaching profession. However, many of the report’s recommendations do align with policy reforms currently being proposed for the Higher Education Act and included in the Elementary and Secondary Education Act reauthorizations and are therefore important to read critically and consider carefully. In advancing evaluation of teachers by test scores, the report goes against the cautions and guidelines recently released by the American Statistical Association and the American Educational Research Association. Other than a review of contemporary issues, the report offers little of substance to advance the teaching profession.

Let’s hold our collective breaths about which will win out. Any predictions?

UPDATED: Mainstream Media in (Perpetual) Crisis: More Education Meat Grinder

UPDATE: Note Holly Yettick’s One Small Droplet: News Media Coverage of Peer-Reviewed and University-Based Education Research and Academic Expertise; see abstract:

Most members of the American public will never read this article. Instead, they will obtain much of their information about education from the news media. Yet little academic research has examined the type or quality of education research and expertise they will find there. Through the lens of gatekeeping theory, this mixed-methods study aims to address that gap by examining the prevalence of news media citations of evidence that has undergone the quality-control measure of peer review and expertise associated with academics generally required to have expertise in their fields. Results suggest that, unlike science or medical journalists, education writers virtually never cite peer-reviewed research. Nor do they use the American Educational Research Association as a resource. Academic experts are also underrepresented in news media coverage, especially when compared to government officials [bold aded]. Barriers between the news media and academia include structural differences between research on education and the medical or life sciences as well as journalists’ lack of knowledge of the definition and value of peer review and tendency to apply and misapply news values to social science research and expertise.

“‘Only four out of ten U.S. children finish high school, only one out of five who finish high school goes to college’”: This spells doom for the U.S. economy, or to be more accurate, this spelled doom for the U.S. economy.

Except it didn’t, of course, as it is a quote in a 1947 issue of Time from John Ward Studebaker, a former school superintendent who served as U.S. Commissioner of Education (analogous to today’s Secretary of Education) in the mid-1940s.

Jump forward to 26 December 2015 and The New York TimesAs Graduation Rates Rise, Experts Fear Diplomas Come Up Short. Motoko Rich, as in the Time article, builds her case on Secretary of Education Arne Duncan, as Susan Ohanian confronts:

Here’s a front page. above-the-fold New York Times non-story that’s a perfect depiction of damning schools every-which-way. Schools with low graduation rates are depicted as failures; improve graduation rates, and then the diplomas they’re handing out are judged to have no meaning. And the Times gives the departing Secretary of Education star billing on this issue.

Quotation of the Day
The goal is not just high school graduation. The goal is being truly college and career ready.

–ARNE DUNCAN, the departing secretary of
education, on the United States 82 percent graduation rate in 2013-14, the highest on record.–New York Times, Dec. 27, 2015

Along with the meat grinder of incessantly new high-stakes accountability standards and testing over the past thirty-plus years, U.S. public education has been demonized since the mid-1900s and relentlessly framed within crisis discourse by the mainstream media for a century.

Rich’s cover piece spends an inordinate amount of energy to twist public schools into that crisis image while making no effort to investigate or challenge Duncan (a life-long appointee with no expertise in education and no credibility as a leader in education) or to unpack the stale platitudes and unsubstantiated claims about education reaching back at least to the Time article.

Duncan and Rich share, in fact, no experience or education in teaching as well as the disproportionate power of their voices in the field despite that lack of expertise.

On the other hand, I taught public high school English in rural South Carolina (not far from the school Rich highlights), have been an educator in SC over 30 years total, have a doctorate in education that emphasized the history of the field, and now am a teacher educator at a university just a couple miles from the school in Rich’s piece (I know teachers there, and have had several teacher candidates placed there for field work). As well, I taught journalism and was the faculty sponsor of the school newspaper, and have been a professional writer for about the same amount of time as I have been teaching, including writing and publishing a good deal of journalism (mostly about education).

This is not, however, an attack on Duncan or Rich—because they are not unique but typical of the mismatch of high-level voice with a lack of expertise.

Mainstream media appear fatally wed to only one version of the U.S. public education story: crisis.

And thus, journalists reach out to the same know-nothings (political leaders, political appointees, think-tank talking heads) and reproduce the same stories over and over and over [1].

Here, then, let me offer a few keys to moving beyond the reductive crisis-meme-as-education-journalism:

  • Public education has never been and is not now in crisis. “Crisis” is the wrong metaphor for entrenched patterns that have existed over a century. A jet plane crash landing into the Hudson River is a crisis; public education suffers under forces far more complicated than a crisis.
  • Metrics such as highs-takes test scores and graduation rates have always and currently tell us more about the conditions of children’s lives than to what degree public schools are effective.
  • Short-hand terms such as “college and career ready” and “grade-level reading” are little more than hokum; they are the inadequate verbal versions of the metrics noted above.
  • The nebulous relationship between the quality of education in the U.S. and the fragility of the U.S. economy simply has never existed. Throughout the past century, no one has ever found any direct or clear positive correlation between measures of educational quality in the U.S. and the strength of the U.S. economy.
  • Yes, racial and class segregation is on the rise in the U.S., and so-called majority-minority schools as well as high-poverty schools are quickly becoming the norm of public education. While demographics of race and class remain strongly correlated with the metrics we use to label schools as failing, the problem lies in the data (high-stakes tests remain race, class, and gender biased), not necessarily the students, teachers, or administrators.
  • However, historically and currently, public education’s great failures are two-fold: (1) public schools reflect the staggering social inequities of the U.S. culture, and (2) public schools too often perpetuate those same inequities (for example, tracking and disciplinary policies).

The mainstream media’s meat grinder of crisis-only reporting on public education achieves some extremely powerful and corrosive consequences.

First, the public remains grossly misinformed about public schools as a foundational institution in a democracy.

Next, that misleading and inaccurate crisis narrative fuels the political myopia behind remaining within the same education policy paradigm that has never addressed the real problems and never achieved the promises attached to each new policy (see from NCLB to ESSA).

And finally, this fact remains: Political and public will in the U.S. has failed public education; it has not failed us.

Mainstream media remain trapped in the education crisis narrative, I think, because neither the media nor the collective political/public consciousness is willing to confront some really ugly truths beneath the cultural commitment to the powerful and flawed rugged individual mythology in the U.S.: America is a classist, racist, and sexist society.

We are committed to allowing privilege beget privilege and to pretending that fruits of privilege are the result of effort and merit.

There is no crisis in education, but our democracy is being held hostage by incompetent politicians and a compliant mainstream media—all of which, ironically, would be served well by the sort of universal public education envisioned by the tarnished founding fathers’ idealistic (and hypocritical) rhetoric [2].

[1] See Educational Expertise, Advocacy, and Media Influence, Joel R. Malin and Christopher Lubienski; The Research that Reaches the Public: Who Produces the Educational Research Mentioned in the News Media?, Holly Yettick; The Media and Educational Research: What We Know vs. What the Public Hears, Alex Molnar

[2] See Thomas Jefferson’s argument for a democracy embracing education:

The object [of my education bill was] to bring into action that mass of talents which lies buried in poverty in every country for want of the means of development, and thus give activity to a mass of mind which in proportion to our population shall be the double or treble of what it is in most countries. ([1817], pp. 275-276)

The less wealthy people, . .by the bill for a general education, would be qualified to understand their rights, to maintain them, and to exercise with intelligence their parts in self-government; and all this would be effected without the violation of a single natural right of any one individual citizen. (p. 50)

To all of which is added a selection from the elementary schools of subjects of the most promising genius, whose parents are too poor to give them further education, to be carried at the public expense through the colleges and university.  (p. 275)

By that part of our plan which prescribes the selection of the youths of genius from among the classes of the Poor, we hope to avail the State of those talents which nature has sown as liberally among the poor as the rich, but which perish without use, if not sought for and cultivated. But of all the views of this law none is more important none more legitimate, than that of rendering the people the safe, as they are the ultimate, guardians of their own liberty. (p. 276)

The tax which will be paid for this purpose is not more than the thousandth part of what will be paid to kings, priests and nobles who will rise up among us if we leave the people in ignorance. (p. 278)

More on Evidence-Based Practice: The Tyranny of Technocrats

Depending on your historical and literary preferences, spend a bit of time with Franz Kafka or Dilbert and you should understand the great failure of the standards movement in both how we teach and how we certify teachers—bureaucracy.

Bureaucracy tends to be inadequate because bureaucrats themselves are often lacking professional or disciplinary credibility or experience, depending, however, on the status of their authority to impose mandates. For education, Arne Duncan serves well as the face of the bureaucrat, an appointee who has only the bully pulpit of his appointment to hold forth on policy.

However, as corrosive to education—and ultimately to evidence-based practice—is the technocrat.

Technocrats, unlike bureaucrats, are themselves credible, although narrowly so. For technocrats, “evidence” is only that which can be measured, and data serve to draw generalizations from randomized samples.

In short, technocrats have no interest in the real world, but in the powerful narcotic of the bell-shaped curve.

As a result, a technocrat’s view often fails human decency (think Charles Murray and Richard Herrnstein) and certainly erases the very human reality of individual outliers.

The face of the technocrat—in fact, the technocrat’s technocrat—is Daniel Willingham, whose work is often invoked as if handed down by the hand on God, chiseled on tablets. [1] [Note: If you sense snark here, I am not suggesting Willingham’s work is flawed or unimportant (I would say important but narrow), but am being snarky about how others wield the technocratic hammer in his name.]

And it is here I want to return to a few points I have made recently:

  • Even the gold standard of experimental research fails the teacher in her day-to-day work because her classroom is not a random sampling of students, because her work is mostly with outliers.
  • And in the teaching moment, what counts as evidence becomes that teacher’s experience couched in that teacher’s content and teaching knowledge as all of that happens against the on-going evidence of the act of teaching.

Stewart Riddle, offering yet another effort in the reading war, is essentially speaking for evidence-based practice while raising a red flag against the tyranny of the technocrat, embodied by the systematic phonics crowd (those who wave the Willingham flag, for example).

On Twitter, in response to my piece on evidence-based policy and practice, Nick Kilstein raised a great point:

My ultimate response (prompting this blog):

My thoughts here, building on the bullet points above, are that having our practice informed by a wide range of evidence (including important evidence from technocrats, but also from other types of evidence, especially qualitative research [2] that can account for outliers, nuance, and the unexpected) is much different than having our practice mandated by evidence (think intensive, systematic phonics for all children regardless of needs or fluency because that is the program the school has adopted).

For day-to-day teaching, the tensions of the disciplines remain important: what we can measure against what measuring cannot address.

When Willingham proclaims that a certain type of research does not support the existence of learning styles, for example, teachers should use that to be very skeptical of the huge amount of oversimplified and misguided “teacher guides” and programs that espouse learning styles templates, practices, and models. [3]

But day-to-day teaching certainly reveals that each of our students is different, demanding from us some recognition of those differences in both what and how we teach them.

It is in the face of a single child that technocrats fail us—as Simon P. Walker notes:

Some educational researchers retreat to empiricist methods. Quantitative studies are commissioned on huge sample sizes. Claims are made, but how valid are those claims to the real-life of the classroom? For example, what if one study examines 5,000 students to see if they turn right rather than left after being shown more red left signs. Yes, we now with confidence know students turn left when shown red signs. But so what?  What can we extrapolate from that?  How much weight can that finding bear when predicting human behaviour in complex real world situations where students make hundreds of decisions to turn left and right moment by moment? The finding is valid but is it useful?

If that child needs direct phonics or grammar instruction, then I must offer them. If that child is beyond direct phonics and grammar instruction or if that direct instruction inhibits her/his learning to read and write, then I must know other strategies (again, this is essentially what whole language supports).

The tyranny of the bureaucrats is easy to refute, but the tyranny of the technocrat is much more complicated since that evidence is important, it does matter—but again, evidence of all sorts must inform the daily work of teaching, not mandate it.

Professional and scholarly teachers are obligated to resist the mandates by being fully informed; neither compliance nor ignorance serves us well as a profession.

[1] For more on worshipping technocrats, explore this, notably the cult of John Hattie and that those who cite his work never acknowledge the serious concerns raised about that work (see the bottom of the post).

[2] Full disclosure, I wrote a biography for my EdD dissertation (published here), and also have written a critical consideration of quantitative data.

[3] See, for example, how evidence (Hart and Risley) functions to limit and distort practice in the context of the “word gap.” The incessant drumbeat of the “Hart and Risley” refrain is the poster child of the tyranny of technocrats.

Media Fail, 10,000 hours, and Grit: The Great Media-Disciplines Divide, pt. 2

In his The Danger of Delegating Education to Journalists: Why the APS Observer Needs Peer Review When Summarizing New Scientific Developments [1], K. Anders Ericsson makes several key points about how the mainstream media present disciplinary knowledge to the public, focusing on Malcolm Gladwell’s misleading but popular 10,000 hour rule.

Ericsson’s key point includes:

Although I accept that the process of writing an engaging popular article requires considerable simplification, I think it is essential that the article does not contain incorrect statements and misinformation. My primary goal with this review is to describe several claims in Jaffe’s article that were simply false or clearly misleading and then discuss how APS might successfully develop successful methods for providing research summaries for non-specialists that are informative and accurately presents the major views of APS members and Fellows. At the very least they should not contain factually incorrect statements and avoid reinforcing existing misconceptions in the popular media.

Through the Gladwell/10,000 hour rule example, Ericsson provides an important argument relevant to the current (and historical) public debate about school quality, teaching and learning, and education reform.

Much in the same way Gladwell has misrepresented research (which is typical within the media), and how that has been uncritically embraced by the media and public (as well as many if not most practitioners), a wide array of issues have received the same fate: learning styles, “grit,” collaborative learning, progressive education, charter schools, school choice, language gap, and so on.

Even when a claim or practice has a kernel of research at its source, popular oversimplification (often by journalists, but practitioners as well) and then commercialization/politicizing (creating programs and policies through publishers, “star” advocates, and legislation) significantly distort that research.

Education Has Failed Research, Historically

John Dewey represents an odd paradox in that he is possibly the most mentioned educator in the U.S. (either as the source of all that is wrong in education or idealistically cited as all that is right about how school could be), despite the reality that Dewey is mostly misunderstood and misrepresented; and thus his philosophy, progressivism, remains mostly absent in U.S. public schools.

Dewey can be blamed, in part, for this reality because he refused on principle to allow his experiments in education to be carefully catalogued because he believed no educational practice should be come a template for others.

Throughout much of the twentieth century, Lou LaBrant, a vigilant progressive educator, spent much of her career practicing and advocating for progressive literacy instruction, but LaBrant also confronted the many instances of how progressivism was misrepresented.

Broadly, and early, LaBrant recognized the public confusion about progressivism:

Two adults speak of “progressive education.” One means a school where responsibility, critical thinking, and honest expression are emphasized; the other thinks of license, lack of plans, irresponsibility. They argue fruitlessly about being “for” or “against” progressive education. (LaBrant, 1944, pp. 477-478)

But she also confronted how progressivism was mostly distorted in its application. LaBrant’s criticisms still reflect why education has failed research, and why research has not failed education.

Credible educational research-based philosophy, theory, and pedagogy are often corrupted by oversimplification.

In 1931, LaBrant published a scathing criticism of the popularity of the project method, an oversimplification of Dewey that resulted in students doing crafts in English class instead of reading or writing:

The cause for my wrath is not new or single. It is of slow growth and has many characteristics. It is known to many as a variation of the project method; to me, as the soap performance. With the project, neatly defined by theorizing educators as “a purposeful activity carried to a successful conclusion,” I know better than to be at war. With what passes for purposeful activity and is unfortunately carried to a conclusion because it will kill time, I have much to complain. To be, for a moment, coherent: I am disturbed by the practice, much more common than our publications would indicate, of using the carving of little toy boats and castles, the dressing of quaint dolls, the pasting of advertising pictures, and the manipulation of clay and soap as the teaching of English literature. (p. 245)

Credible educational research is often corrupted by commercialization/politicizing, reducing that research to misguided programs/legislation.

“[L]anguage behavior can not be reduced to formula,” LaBrant (1947) argued (p. 20)—emphasizing that literacy growth was complicated but flourished when it was child-centered and practical (for example, in the ways many privileged children experience in their homes because one or more of the parents are afforded the conditions within which to foster their children’s literacy).

By mid-twentieth century, LaBrant (1949) had identified the central failure of teaching reading: “Our language programs have been set up as costume parties and not anything more basic than that” (p. 16).

For at least 80-plus years since LaBrant fought this fight, the same patterns of media, political, public, and practitioners failing educational research have continued

Oversimplification, Commercialization/Politicizing: Recovering the Evidence

The list is incredibly long, too long to be exhaustive here, but consider the following: sloganism (“Work hard. Be nice.”), silver-bullet ideologies (“grit,” 10,000-hour rule), miracle schools (KIPP), evidence-based programs (Dibbles, 4-block, 6-traits), common sense claims and policy absent evidence (Common Core), and trendy legislation (3rd-grade retention policies as reading policy, merit pay) as well as politicized government reports (National Reading Panel).

Each of these can be traced to some kernel of research (sometimes robust bodies of research, and sometimes cherry-picked research), but all of these represent a current and historical fact: Education has failed research, but research has not failed education.

When educational research is reduced to scripts or programs/legislation, that knowledge base is invariably distorted, corrupted—as Ericsson details well above.

Journalists, politicians, and commercial education entities have all played a fundamental and crippling role in this reality; thus, as Ericsson argues, educators, scholars and researchers must not allow the fate of educational research to remain primarily in the wrong hands.

We have a public and professional obligation to confront these oversimplifications as well as the commercialization/politicizing of educational research. And we must do this through our public work that speaks to those failures and the public simultaneously.

As LaBrant and Ericsson reveal, unless we take that call seriously, we too are part of the reason education continues to fail research.

References

LaBrant, L. (1949). A genetic approach to language. Unpublished manuscript, Institute of General Semantics, Lakeville, CT.

LaBrant, L. (1947). Um-brel-la has syllables three. The Packet, 2(1), 20-25.

LaBrant, L. (1944, November). The words they know. The English Journal, 33(9), 475-480.

LaBrant, L. (1931, March). MasqueradingThe English Journal, 20(3), pp. 244-246.

For Further Reading

U.S. and Education Reform Need a Critical Free Press

My Open Letter to Journalists: A Critical Free Press, pt. 2

NPR Whitewashes “Grit” Narrative

Shiny Happy People: NPR, “Grit,” and “Myths that Deform” pt. 2

How I Learned to Distrust the Media (about Education)

My (Often Painful) Online Education

[1] See original and downloadable link to the paper here.

What We Know Now (and How It Doesn’t Matter)

Randy Olson’s Flock of Dodos (2006) explores the evolution and Intelligent Design (ID) debate that represents the newest attack on teaching evolution in U.S. public schools. The documentary is engaging, enlightening, and nearly too fair considering Olson admits upfront that he stands with scientists who support evolution as credible science and reject ID as something outside the realm of science.

Olson’s film, however, offers a powerful message that rises above the evolution debate. Particularly in the scenes depicting scientists discussing (during a poker game) why evolution remains a target of political and public interests, the documentary shows that evidence-based expertise often fails against clear and compelling messages (such as “teach the controversy”)—even when those clear and compelling messages are inaccurate.

In other words, ID advocacy has often won in the courts of political and public opinion despite having no credibility within the discipline it claims to inform—evolutionary biology.

With that sobering reality in mind, please identify what XYZ represents in the following statement about “What We Know Now”:

Is there a bottom line to all of this? If there is one, it would appear to be this: Despite media coverage, which has been exceedingly selective and misrepresentative, and despite the anecdotal meanderings of politicians, community members, educators, board members, parents, and students, XYZ have not been effective in achieving the outcomes they were assumed to aid….

This analysis is addressing school uniform policies, conducted by sociologist David L. Brunsma who examined evidence on school uniform effectiveness (did school uniform policies achieve stated goals of those policies) “from a variety of data gathered during eight years of rigorous research into this issue.”

This comprehensive analysis of research from Brunsma replicates the message in Flock of Dodos—political, public, and media messaging continues to trump evidence in the education reform debate. Making that reality more troubling is that a central element of No Child Left Behind was a call to usher in an era of scientifically based education research. As Sasha Zucker notes in a 2004 policy report for Pearson, “A significant aspect of the No Child Left Behind Act of 2001 (NCLB) is the use of the phrase ‘scientifically based research’ well over 100 times throughout the text of the law.”

Brunsma’s conclusion about school uniform policies, I regret to note, is not an outlier in education reform but a typical representation of education reform policy. Let’s consider what we know now about the major education reform agendas currently impacting out schools:

Well into the second decade of the twenty-first century, then, education reform continues a failed tradition of honoring messaging over evidence. Neither the claims made about educational failures, nor the solutions for education reform policy today are supported by large bodies of compelling research.

As the fate of NCLB continues to be debated, the evidence shows not only that NCLB has failed its stated goals, but also that politicians, the media, and the public have failed to embrace the one element of the legislation that held the most promise—scientifically based research—suggesting that dodos may in fact not be extinct.

* Santelices, M. V., & Wilson, M. (2010, Spring). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Educational Review, 80(1), 106-133.; Spelke, E. S. (2005, December). Sex differences in intrinsic aptitude for mathematics and science? American Psychologist, 60(9), 950-958; See page 4 for 2012 SAT data: http://media.collegeboard.com/digitalServices/pdf/research/TotalGroup-2012.pdf

NFL again a Harbinger for Failed Education Reform?

During the impending NFL strike in 2011—the act of a union—I drew a comparison between how the public in the U.S. responds to unionization in different contexts:

“I am speaking about the possible NFL strike that hangs over this coming Super Bowl weekend: a struggle between billionaires and millionaires, which, indirectly, shines an important light on the rise of teacher and teacher union-bashing in the US. Adam Bessie, in Truthout, identifies how the myth of the bad teacher has evolved.”

Once again, the NFL is facing a situation that I believe and even hope is another harbinger of how education reform can be halted: A suit filed by the family of Junior Seau:

“The family said the league not only ‘propagated the false myth that collisions of all kinds, including brutal and ferocious collisions, many of which lead to short-term and long-term neurological damage to players, are an acceptable, desired and natural consequence of the game,’ but also that ‘the N.F.L. failed to disseminate to then-current and former N.F.L. players health information it possessed’ about the risks associated with brain trauma.”

This law suit has prompted a considerable amount of debate concerning whether or not the NFL as we currently know it could be dramatically reconfigured under the pressure of more law suits. In other words, the inherent but often ignored or concealed dangers of football are now being exposed by legal action, in much the same way as the tobacco industry was unmasked and thus the entire culture of smoking has radically changed in the last couple decades.

With the release of the Education Policy Analysis Archives (EPAA) Special Issue on “Value-Added Model (VAM) Research for Educational Policy,” a similar question should now be raised about the future of implementing high-stakes accountability policies that focus on teacher evaluation and retention through VAM-style metrics.

“High-Stakes Implementation of VAM,…Premature”

Two articles in the special issue from EPAA examines the validity and reliability of VAM-based teacher evaluation in high-stakes settings and then places these policies in the context of legal ramifications faced by districts and states for those policies.

“The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era” (Baker, Oluwole, & Green, 2013) identifies the current trend: “Spurred by the Race-to-the-Top program championed by the Obama administration and a changing political climate in favor of holding teachers accountable for the performance of their students, many states revamped their tenure laws and passed additional legislation designed to tie student performance to teacher evaluations” (p. 3). Because of the political and public momentum behind reforming teacher evaluation, Baker, Oluwole, and Green seek “to bring some urgency to the need to re-examine the current legislative models that put teachers at great risk of unfair evaluation, removal of tenure, and ultimately wrongful dismissal” (p. 5).

While Baker, Oluwole, and Green offer a detailed and evidence-based examination of the VAM-based and student growth model approaches to high-stakes teacher accountability, they ultimately place the weaknesses of reform policies in the context of potential challenges from teachers who believe they have been wrongfully evaluated or dismissed:

“In this section, we address the various legal challenges that might be brought by teachers dismissed under the rigid statutory structures outlined previously in this article. We also address how arguments on behalf of teachers might be framed differently in a context where value-added measures are used versus one where student growth percentiles are used. Where value-added measures are used, we suspect that teachers will have to show that while those measures were intended to attribute student achievement to their effectiveness, the measures failed to do so in a number of ways. That is, where value-added measures are used to assign effectiveness ratings, we suspect that the validity and reliability, as well as understandability of those measures would need to be deliberated at trial. However, where student growth percentiles are used, we would argue that the measures on their face are simply not designed for attributing responsibility to the teacher, and thus making such a leap would necessarily constitute a wrongful judgment. That is, one would not necessarily even have to vet the SGP measures for reliability or validity via any statistical analysis, because on their face they are invalid for this purpose.”

The analysis ultimately discredits both the use of narrow metrics to determine teacher quality and the high-stakes policies being implemented using those metrics, concluding with the ironic consequences of these policies: “Overly prescriptive, rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights. This is likely despite the fact that much of the policy impetus behind these new evaluation systems is the reduction of legal hassles involved in terminating ineffective teachers” (pp. 18-19).

In “Legal Issues in the Use of Student Test Scores and Value-added Models (VAM) to Determine Educational Quality” (Pullin, 2013), the rapid increase of VAM-based accountability is further examined in the context of “a wide array of potential legal issues [that] could arise from the implementation of these programs” (p. 2).

Pullin notes the motivation for reforming teacher evaluation:

“VAM initiatives are consistent with a highly publicized press from the business community and many politicians to make government services more like private business, data-driven to measure productivity and accountability (Kupermintz, 2003). VAM approaches are in part a response to concerns that the current system of selecting and compensating teachers based their education and credentials is insufficient for insuring teacher quality (Corcoran, 2011; Gordon, Kane & Staiger, 2006; Hanushek & Rivkin, 2012; Harris, 2011). There have been increasing expressions of concern that teacher evaluation practices are not robust and do not improve practice (Kennedy, 2010). In the contemporary public policy context, much of the support for the use of student test scores for educator evaluation comes from a concern that the current system for evaluation is ineffective and that the current legal protections for teachers are too cumbersome for schools seeking to terminate teachers (Harris, 2009, 2011).”

While a business model for addressing quality control of a work force may seem efficient, Pullin highlights that legal ramifications are likely with these new models.

Pullin’s analysis offers a detailed and useful examination of previous court cases involving the use of test scores to evaluate educators, including recent cases involving VAM, concluding that the picture is not clear on how the courts may rule in the future, but that a pattern exists of “heavy judicial deference to state and local education policymakers and the allure of using test scores to make decisions about education quality” (p. 5).

Further, Pullin notes “there are differences of perspective among social scientists about VAM and the defensibility of using it to make high-stakes decisions about educators,” further complicating the concerns of legal action (p. 9).

While raising many other complications, Pullin also notes that students and parents may enter legal battles using VAM metrics “to substantiate their own legal claims that schools are not meeting their obligations to provide education” (p. 14).

Pullin concludes with a sobering look at teacher quality reform built on VAM and implemented in high-stakes environments:

“In the broad contemporary public policy context for education reform, the desire for accountability and transparency in government, coupled with heavily financed criticisms of public school teachers and their unions, may mean that VAM initiatives will prevail. The concerns of education researchers about VAM, coupled with legal obligations for the validity and reliability of education and evaluation programs should require judges and education policymakers to take a closer look for future decision-making. At the same time, the social science research community should be generating substantial new and persuasive evidence about VAM and the validity and reliability of all of its potential uses. For public policymakers, there are strong reasons to suggest that high-stakes implementation of VAM is, at best, premature and, as a result, the potential for successful legal challenge to its use is high. The use of VAM as a policy tool for meaningful education improvement has considerable limitations, whether or not some judges might consider it legally defensible.” (p. 17)

Like the NFL, federal and state governments may soon be compelled to reform the reform movement under the threat of legal action from a variety of stakeholders since the science of teacher evaluation remains far behind the curve of implementation, particularly when teacher evaluation is high-stakes and based on VAM and other metrics linked to student test scores.

The special issue from EPAA is yet another call for political leadership to pause if not end wide-scale teacher evaluation and retention models that pose legal, statistical, and funding challenges that those leaders appear unwilling to acknowledge or address.