Worst NAEP Reading 2019 Hot Take of the Week

This is really a hard feat, but Put “Whole Language” on Trial by the King of Know-Nothing edureform Michael J. Petrilli is easily the worst NAEP Reading 2019 Hot Take of the week.

Dumb doesn’t get any dumber than this:

An equal-opportunity challenge related to shoddy teacher preparation would likely face the same roadblock. Further, there is little, if any, legal precedent for suing schools of education; even medical schools enjoy significant protections from charges of malpractice related to the physicians they train.

All of which is why it would be important to go after states, and in particular states that have already lost finance-adequacy cases. It would also help if the chosen target states do not require elementary-school teachers to pass an in-depth test of the science of reading before entering the classroom, and if the states host several big education schools that earn failing grades when it comes to preparing candidates to teach reading effectively. States that appear to meet those criteria include Kentucky, South Carolina, and Washington.

Petrilli has entered the “science of reading debate” a little late but just as fact-challenged as the other advocates have been.

So here are a few questions:

  • Where is scientific research that how teacher education programs prepare teachers to teach reading is actually how they teach reading once in the field? (Not any.)
  • Where is scientific research that there is a causal relationship between how reading is taught (and if those approaches are uniform across an entire state) and NAEP scores? (Not any.)
  • And where is scientific research to explain—as Petrilli highlights above—these outcomes for Kentucky, South Carolina, and Washington? (Not any.)

NAEP R 2019 4

NAEP R 2019 7

NAEP R 2019 8

Kentucky, South Carolina, and Washington have 2017 and 2019 NAEP reading scores all over the place—above, at, and below the national average; dropping from 4th to 8th; and dropping from 2017 to 2019 (except one increase by SC, which remains below the national average in every test).

Petrilli is yet another know-nothing that is grasping as “scientific” straws with no evidence on his side.

This is all hokem, rhetorical grandstanding that proves to be hollow.

Shouting the “science of reading” proves itself once again to been mere ideology.

JoLLE @ UGA November 2019

JoLLE @ UGA November 2019

See Review of Critical Media Literacy and Fake News in Post-Truth America by Christian Z. Goering and Paul L. Thomas

See also three of my poems: “war in the time of commas”, “grammar Nazis (post-apostrophe literature”, & “like two lower-case pronouns”

Research, the Media, and the Market: A Cautionary Tale

Reporting in The New York Times, Gina Kolata offers a compelling lede:

The findings of a large federal study on bypass surgeries and stents call into question the medical care provided to tens of thousands of heart disease patients with blocked coronary arteries, scientists reported at the annual meeting of the American Heart Association on Saturday.

The new study found that patients who received drug therapy alone did not experience more heart attacks or die more often than those who also received bypass surgery or stents, tiny wire cages used to open narrowed arteries.

And Julie Steenhuysen adds an interesting detail to this new major study: “At least two prior studies determined that artery-clearing and stenting or bypass surgery in addition to medical treatment does not significantly lower the risk of heart attacks or death compared with non-invasive medical approaches alone.”

But these details may prove to be the most important ones of all: “Over $8 billion worth of coronary stents will be sold annually by 2025, according to a new research report by Global Market Insights, Inc. The increase over the years will be created by an increase in artery diseases coupled with a growing demand for minimally invasive surgeries,” explains Stephen Mraz.

So now let’s do the math. If heart doctors shift to what the new research shows, “The nation could save more than $775 million a year by not giving stents to the 31,000 patients who get the devices even though they have no chest pain, Dr. Hochman said,” reports Kolata.

Better and less intrusive patient care, lower overall medical costs for a U.S. healthcare system already overburdened—what is there to keep the medical profession from embracing compelling scientific research?

Well, the market of course.

Lower costs come from fewer heart surgeries, meaning heart surgeons lose income—and possibly patients.

Keep in mind that while the medical profession decades ago emphasized best practice in prescribing antibiotics (only when bacterial infections are detected), many doctors found that following best practice led to dissatisfied patients, who flexed their consumer muscles by finding doctors who would usurp best practice and prescribe the requested antibiotics even when they weren’t warranted.

The new research on stents and heart disease treatment is a cautionary tale involving research, the media, and the market—a cautionary tale that should inform the current call for the “science of reading,” especially as that impacts children with dyslexia.

That several studies now show the use of stents should be reduced or at least delayed, but that doctors have resisted that evidence calls out for us to ask an important question about scientific research: In whose interest is the research being applied?

At the International Literacy Association 2019 conference, P. David Pearson, University of California, Berkeley, lays out in about 11 minutes a compelling unpacking of What Research Really Says About Teaching Reading–and Why That Still Matters.

In this framing talk before a panel discussion, Pearson confronts the role of media in misinforming the public about research, challenges advocates of “scientific research” who fluctuate between endorsing research and following “common sense,” and calls for not ignoring “scientific research” but expanding the types of research relied upon to make teaching and learning decisions (recognizing a broad spectrum of evidence-based research that trumps ideology or assumptions).

One of the most compelling examples offered by Pearson is how the media framed research on reading after the report from controversial National Reading Panel (NRP), at the center of No Child Left Behind’s mandate for scientific research. The headline Pearson highlights is “Systematic, explicit, synthetic phonics improves reading achievement.”

Yet, the specific study being cited actually was far more complicated, and less endorsing of systematic phonics; along with “many other elements…, a small but robust effect for a subset of the population is found on a measure that requires kids to read a lists of pseudowords.”

Pearson adds that even if we accept the larger NRP report as valid (and several scholars do not), the report calls for systematic phonics for K-1 students, not older struggling readers. Yet, as Pearson explains, many calling for the “science of reading” push for systematic phonics programs throughout grades well beyond grade 1.

So there exists several traps in calling for scientific research in education, and more narrowly, in the teaching of reading.

As another example, consider Timothy Shanahan’s response to the effectiveness of dyslexia fonts:

Over the past decade or so, three new fonts have appeared (Open Dyslexia, Dyslexie, and Read Regular), all claiming—without any empirical evidence—to somehow aid dyslexic readers.

Since then there have been 8 studies into the value of these fonts.

Most of the studies found no improvement in reading rate, accuracy, or eye fixations (Duranovic, et al., 2018; Kuster, et al., 2018; Rello & Baeza-Yates, 2013; Wery & Diliberto, 2017). The studies even found that dyslexics—children and adults—preferred reading standard fonts to the special ones (Harley, et al., 2016; Kuster, et al., 2018; Wery & Diliberto, 2017).

Only one study reported a benefit of any kind—the dyslexic students in this study read faster (Marinus, et al., 2016). This benefit apparently came, not from the font design, but from the spacing within and between words. The researchers increased the spacings in the standard fonts and the same effect was seen. Masulli (2018) likewise found that larger spacings improved the reading speed of dyslexics—but that effect was apparent with non-dyslexic readers, as well.

Reading faster is a good thing, of course, as long as reading comprehension is maintained. Unfortunately, these studies didn’t look at that.

The use of dyslexia fonts, then, are driven by the market—consumer demand being met by businesses—but not supported by evidence; neither the claims of the businesses nor the outcomes from implementing the fonts are justified by “scientific evidence.”

Just as Hooked on Phonics flourished in two different iterations (the first felled by court rulings that exposed the lack of research backing market claims), many reading and phonics programs in education are buoyed by ideology and the market but not by research.

But the traps around programs and “scientific” are extremely complex from two different angles.

First, as noted in several examples above, teaching and learning are likely not served well within a market dynamic whereby parents and students are the consumers and teachers and the schools serve the inexpert demands of those consumers.

Yes, parents and students have a right to express their need, but they most often lack the expertise to demand how that need should be met.

Parents of children with reading problems or dyslexia should be demanding that their children be served better and appropriately. But calling for specific policy and practice is outside the purview of those “consumers.” (This is the same dynamic in patients seeking doctors who prescribe antibiotics when they are not needed, creating a health hazard for themselves and others when medical best practice is usurped by market demand.)

The second trap, however, is “scientific” itself. As I have detailed, experimental and quasi-experimental research (what we mean by “scientific,” as Pearson discusses) draws causal relationships that can be generalized. By definition, then, generalizable research doesn’t address outliers or real-world situations where several factors impact the effectiveness of teaching and learning.

The “scientific” trap positions a parent of a child struggling to read, diagnosed with dyslexia, into a problematic corner if that child finds success with dyslexia fonts, a practice not supported by research.

Teaching and teachers must be guided by evidence, both the evidence of a wide range of research and the evidence drawn from the individual students in any classroom.

To teach is to quilt together what a teacher knows about the field, reading for example, and then to match instruction to where any student is and where any student wishes to go.

This, ironically, is the philosophy behind balanced literacy, the approach demonized (usually with false claims and without evidence) by those calling for the “science of reading.”

Each time advocacy for systematic intensive phonics for all students gains momentum, I ask the key question: In whose interest is the research being applied?

Go back to the new research on stents, a true life-and-death matter, and think about that question when you read the media demand the “science of reading.”


For Further Consideration

Flu Outbreak Reduces Class Sizes To Level Appropriate For Learning

The Wrong “Scientific” for Education

The release of National Assessment of Educational Progress (NAEP) 2019 scores in math and reading, announced as an “emergency” and “devastating,” has thrown gasoline on the rhetorical fire that has already been sweeping across media—a call for “scientific” research to save public education in the U.S.:

While the media and the public seem historically and currently convinced by the rhetoric of “scientific,” there is a significant irony to the lack of scientific evidence backing claims about the causes of NAEP scores; for example, some have rushed to argue that intensive phonics instruction and grade retention legislation have caused Mississippi’s NAEP reading gains while many have used 2019 NAEP scores to declare the entire accountability era a failure.

Yet, none of these claims have the necessary scientific evidence to make any of these arguments. There simply has not been the time or the efforts to construct scientific studies (experimental or quasi-experimental) to identify causal factors in NAEP score changes.

Another problem with the rhetoric of “scientific” is that coinciding with that advocacy is some very disturbing contradictory realities:

And let’s not forget that for at least two decades, “scientific” has been central to No Child Left Behind and the Common Core—both of which were championed as mechanisms for finally bringing education into a new era of evidence-based practices.

We must wonder: If “scientific” is the answer to our educational failures, what has happened over the past twenty years of “scientific” being legislated into education, resulting in everyone shouting that the sky is falling because 2019 NAEP scores are down from 2017 as well as relatively flat since the early 1990s (30 of the 40 years spanning accountability)?

First, there is the problem of definition. “Scientific” is short-hand for a very narrow type of quantitative research, experimental and quasi-experimental research that is the gold standard of pharmaceutical and medical research.

To meet the standard of “scientific,” then, research in education would have to include random-sample populations of students and a control group in order to draw causal relationships and make generalizations. This process is incredibly expensive in terms of funding and time.

As I noted above, no one has had the time to conduct “scientific” research on 2019 NAEP data so making causal claims of any kind for why NAEP scores dropped is necessarily not “scientific.”

But there is a second, and larger, problem with calling for “scientific” research in education. This narrow form of “scientific” is simply wrong for education.

Experimental and quasi-experimental research seeks to identify causal generalizations. In other words, if we divide all students into a bell-shaped curve with five segments, the meaty center segment would be where the generalization from a study has the greatest effectiveness. The adjacent two outer segments would show some decreasing degrees of effectiveness, leaving the two extreme segments at the far ends of the curve likely showing little or no effectiveness (these students, however, could have learned under instruction not shown as generally effective).

Yet, in a real classroom, teachers are not serving a random sampling of students, and there are no controls to assure that some factors are not causing different outcomes for students even when the instructional practice has been shown by scientific research to be effective.

No matter the science behind instruction, sick, hungry, or bullied students will not be able to learn.

The truth is, in education, scientific studies are nearly impossible to conduct, are often overly burdensome in terms of expense and time, and are ultimately not adequate for the needs of real teachers and students in real classrooms—where teaching and learning are messy, idiosyncratic, and impacted by dozens of factors beyond the control of teachers or students.

Frankly, nothing works for all students, and a generalization can be of no use to a particular student with an outlier need.

While we are over-reacting to 2019 NAEP reading scores, we have failed to recognize that there has never been a period in the U.S. when reading achievement was adequate; over that history teachers have implemented hundreds of different instructional strategies, reading programs, standards, and high-stakes tests—and we always find the outcomes unsatisfying.

If there is any causal relationship between how we teach and how students learn, it is a cumbersome matrix of factors that has been mostly unexamined, especially by “scientific” methods.

And often, history is a better avenue than science.

The twenty-first century has not been the only era calling for “scientific” in educational practice.

The John Dewey progressivism of the early twentieth century was also characterized by a call for scientific practice. Lou LaBrant, who taught from 1906 until 1971 and rose to president of the National Council of Teachers of English in the 1950s, was a lifelong practitioner of Deweyan progressivism.

LaBrant called repeated for closing the “gap” between research and practice, but she also balked at reading and writing programs—singular approaches to teaching all students literacy.

While progressive education and Dewey are often demonized and blamed for educational failure by mid-twentieth century, the truth is that progressivism has never been widely embraced in the U.S.

Today, however, we should be skeptical of the narrow and flawed call for “scientific” and embrace instead the progressive view of “scientific.”

For Dewey, the teacher must simultaneously teach and conduct research—what eventually would be called action research.

To teach, for progressives, is to constantly gather evidence of learning from students in order to drive instruction; in this context, science means that each student receives the instruction they demonstrate a need for and that produces some outcomes of effectiveness.

In an elementary reading class, some students may be working in read aloud groups while others are receiving direct phonics instruction, and even others are sitting in book clubs reading picture books by choice. None of them, however, would be doing test-prep worksheets or computer-based programs.

The current urge toward “scientific” seems to embrace the false assumption that with the right body of research we can identify the single approach for all students to succeed.

Human learning, however, is as varied as there are humans.

This brings us to the current “science of reading” narrative that calls for all students to receive intensive systematic phonics, purportedly because scientific research calls for such. The “science of reading” narrative also rejects and demonizes “balanced literacy” as not “scientific.”

We arrive then back at the problem of definition.

The “science of reading” advocacy is trapped in too narrow a definition of “scientific” that is fundamentally wrong for educating every student. Ironically, again, balanced literacy is a philosophy of literacy (not a program) that implements Deweyan progressive “scientific”; each student receives the reading instruction they need based on the evidence of learning the teacher gathers from previous instruction, evidence used to guide future instruction.

Intensive phonics for all begins with a fixed mandate regardless of student ability or need; balanced literacy starts with the evidence of the student.

If we are going to argue for “scientific” education in the U.S., we would be wise to change the definition, expand the evidence, and tailor our instruction to the needs of our students and not the propagandizing of a few zealots.

For two decades at least, the U.S. has been chasing the wrong “scientific” as a distraction from the real guiding principle, efficiency. Reducing math and reading to discrete skills and testing those skills as evidence for complex human behaviors are as misleading as arguing that “scientific” research will save education.

Teachers as daily, patient researchers charged with teaching each student as that student needs—that is the better “scientific” even as it is much messier and less predictable.

Rule 5: HTFU

In many ways, this past Tuesday was mostly a typical flight night at a local tap house, Growler Haus. We gathered and took over the room to the right, what we have come to call “The Office.”

But this Tuesday we were 15 gathered to say goodbye to a friend moving, as we say around here, “up North,” or in his case, back up North.

CJennings FW3-X2

The “we” in this case is the Spartanburg, SC, cycling community—or at least a part of it. Cycling is vibrant—although the people change and the intensity shifts over the years—in the Upstate of SC from Spartanburg to Greenville especially.

Chris, pictured above holding a mock-up of a plaque in his honor, is moving away, and we spent a couple hours over pints, flights, and food smiling and laughing about his moving to Spartanburg years ago and finding his way into our not-so-warm-and-fuzzy cycling clique; in many ways we are worse than high school, we road and MTB cyclists who have also branched out to gravel riding (anything to justify even slightly the code of bicycle ownership—Rule #12: The correct number of bikes to own is n+1).

If you zoom in, you see the plaque mock-up includes below his name Rule #5—one of the dilemmas faced by those organizing the gesture of farewell.

You see Rule #5, among The Rules at Velominati, is mostly NSFW—Harden the fuck up, or as we say in most public settings, HTFU.

The 15 in attendance and pictured above range in ages from their 20s into their 60s, much like our cycling community, and we all at one point or another have found ourselves stressed to our limits, probably questioning why we were voluntarily participating in a hobby, for fun, something that left us near the brink of death—or simply wishing death would offer a bit of relief.

Recreational cycling is often competitive, both spontaneously on any bicycle ride including more than one rider and during organized events (even the ones that explicitly announce “this is not a race”).

When we are the ones dishing out the pace and pain (a few above are always in that group), we smile and quip: “You know what to do when you are getting dropped? Speed up.”

It is that sort of nonsense that has bonded this group, nonsense that is about the same percentage sincerity as nonsense.

I have been cycling “seriously,” as we say, for well over thirty years. Those early years, I was mere fodder, a peon, but several of the elite locals gave me the treatment that we honor to this day, a sort of loving hazing, a relentless demand that “do better, damnit” is about the same as saying “love you.”

One of my friendly torturers was Fred; I still see him from time to time at mountain biking trails. He has shifted to solitary riding, and I have throttled back significantly my mileage and intensity. But I feel something unnameable every time I see Fred (Rule #3: Guide the uninitiated).

Fred was ruthless and his ability on a bicycle left me in awe. I never came close to Fred in ability, but I creeped toward his tenacity—and I certainly for a few year was a much better cyclist.

And I still know a hell of a lot about bicycling.

Several people in the photograph were shepherded into the flock as I was many moons ago. Now the grasshoppers have become masters; we guard our own Rules vigilantly even as we quote The Rules with a bit of a smile.

I hear our 20 and 30 somethings sigh and lament things aren’t like they used to be, shaking their heads at new riders. And I understand.

We are varied people. Bicycles and most of all riding bicycles join us, even when we don’t think alike, even when we can barely raise our heads or turn the crank in exhaustion.

Maybe especially when things are the toughest.

Only a few years ago four of us above were struck by a motorist (with six others), and a few of us were injured, some badly, one permanently.

And very recently, we were all visiting Chris in the hospital after a freak cycling crash that sent him to the hospital with a broken collarbone that put his cycling on pause for many weeks.

You see, it is just riding a bicycle, something children do, and it is far more than just riding a bicycle.

Cycling is not our entire lives—we have family, careers, and beer—but most of us cannot imagine our lives without cycling.

We will miss Chris, and I am sure, Chris will miss us, and this community.

In any moment of sadness, tugs of weakness, however, we have something to guide us through—HTFU.

Rule #1: Obey the rules.

On Poetry and Prose: Defining the Undefinable

As a professor of first-year writing, I spend a good deal of time helping students unpack what they have learned about the essay as a form and about writing in order to set much of that aside and embrace more nuanced and authentic awareness about both.

Teaching writing is also necessarily entangled with teaching reading. In my young adult literature course, then, I often ask students, undergraduate and graduate (many practicing teachers), to do similar unpacking about their assumptions concerning writing and reading.

I have noted before that my first-year students often mangle what I would consider to be very basic labels for writing forms and genres—calling a short prose piece a poem and identifying a play as a novel because they read both in book form.

Because of the ways students have been taught writing to comply with accountability and high-stakes testing of writing, they also confuse modes (narration, description, exposition, and persuasion) for genres or types of essays.

These overly simplistic or misguided ideas extend to distinguishing between fiction and non-fiction as well as prose and poetry.

I am always adding to my toolkit, then, lessons that ask students to investigate and interrogate genre, form, and mode, instilling a sense that literacy remains something undefinable that we none the less try to define so that we feel we have greater control over it.

This post details a lesson about recognizing all literacy as a journey, and embracing defining the undefinable.

The seeds of the lesson, in fact, start with my own stumbling through my journey with literacy. The first time I read Gate A-4 by Naomi Shihab Nye, I assumed the piece was a personal essay.

I think I may have shared with students and even referred to the passage as such. At some point after that, I ran across the piece being referred to as fiction, a very brief short story.

This week, as I was planning a lesson on how we distinguish poetry from short fiction, I considered using “Gate A-4” along with four poems by women poets—Adrienne Rich’s “Aunt Jennifer’s Tigers,” Maggie Smith’s “Good Bones,” Emily Dickinson’s “Wild night – Wild nights!,”  and Margaret Atwood’s “Siren Song.”

As I searched online for “Gate A-4,” I noticed that the piece was routinely identified as a poem. However, when I did a “Look Inside” search of Naomi Shihab Nye’s Honeybee: Poems & Short Prose, I discovered that the piece is clearly prose, one of what the book description identifies as “eighty-two poems and paragraphs.”

I also discovered a wonderful video of Nye reading the passage:

This became the opening for the lesson, which began with asking students to watch the read aloud without a text in front of them. After viewing, I asked them to identify the text form—what is this thing she is reading?

The students were cautious, even hesitant to answer, exposing, I think, the many elements of a text that advanced readers use to make a significant number of decisions in a very brief moment. We know poetry from prose simply from seeing the text, even before reading.

As we struggled, I handed out a copy of “Gate A-4” and explained it is prose (although some guessed poem). I also pulled up the amazon link and showed them the piece in the original book.

Next I placed them in small groups with the four poems noted above, asking them to use one or as many of them as they wanted to create a quick lesson on what makes a poem, a poem.

The first group decided to use all four poems, and began by noting students would identify what most people associated with poetry in “Aunt Jennifer’s Tigers”—rhyme and stanzas.

They also recognized that turning to “Good Bones,” those assumptions were challenged, as they explained, since this poem didn’t rhyme and has no stanzas (which we later clarified to note it is simply one stanza, constructed of lines).

Since “Aunt Jennifer’s Tigers” and “Wild night – Wild nights!” tend to conform to narrow and traditional characteristics associated with poetry and “Good Bones” and “Siren Song” look poetic but sit outside those characteristics, we began to brainstorm how to have broader concepts; for example, we explored that all the poems have repetition (noting that rhyme is sound repetition) and concluded that poetry is often driven by purposeful line form and stanzas.

Possibly the key moment of this discussion was when the second group added that the best we can say is that a poem is a poem because the writer identifies it as such. We have come to a similar conclusion about the genre of young adult literature.

Another important part of this exploration came from a student who explained that he had always been bothered by trying to write poetry in high school, specifically the concept of line breaks. The how of breaking lines eluded him.

Here is something I always emphasize when teaching someone to write poetry—the art and craft of line breaks.

Broadly, we can help students better understand form and genre by keeping them focused on prose as a work driven by purposeful sentence and paragraph formation and poetry as a work driven by purposeful line and stanza formation (recognizing that even poetry sometimes is prose poetry).

To help answer this student’s concern about line breaks, I pulled up my newest poem about my father’s death, “quotidian,” and walked the class through my first draft (typed in Notes on my iPhone and emailed to myself) as well as how I came to choose and then work within the stanza pattern.

The big-picture lessons from this activity include the following:

  • Helping students understand that writing forms, genres, and modes are driven (not constrained) by some conventions, but also fluid.
  • Exploring that writers of all types of genres and forms work from a very similar toolbox—writers of poetry and prose care about sound, for example.
  • Emphasizing form and meaning are related in writing, but as soon as anyone finds a firm definition, a piece challenges that.
  • Identifying how writers and readers navigate form, genre, and mode with purposefulness as well as awareness. As I explained about line breaks and stanzas when writing poetry, there is no magical formula, but most poets do seek some guiding pattern or patterns and then shape poetry within or against those patterns.

Many years ago as a high school English teacher, I gradually shifted away from defining poetry during our poetry unit, and choosing instead to ask throughout, “What makes poetry, poetry?” We simply came to understand poetry better by asking a question instead of finding a clear definition.

I remain convinced that seeking greater awareness about text is a long journey, best guided by always seeking a definition rather than imposing one.

Regardless of the definition we discover, or fail to uncover, I hope that students remain in awe as I am each time I read “Gate A-4” even as I also remain conflicted about just what the thing is she is reading aloud on the video.

What Is the Relationship among NAEP Scores, Educational Policy, and Classroom Practice?

Annually, the media, public, and political leaders over-react and misrepresent the release of SAT and ACT scores from across the US. Most notably, despite years of warnings from the College Board against the practice, many persist in ranking states by average state scores, ignoring that vastly different populations are being incorrectly compared.

These media, public, and political reactions to SAT and ACT scores are premature and superficial, but the one recurring conclusion that would be fair to emphasize is that, as with all standardized test data, the most persistent correlation to these scores includes the socio-economic status of the students’ families as well as the educational attainment of their parents.

Over many decades of test scores, in fact, educational policy and classroom practices have changed many times, and the consistency of those policies and practices have been significantly lacking and almost entirely unexamined.

For example, when test scores fell in California in the late 1980s and early 1990s, the media, public, and political leaders all blamed the state’s shift to whole language as the official reading policy.

This was a compelling narrative that, as I noted above, proved to be premature and superficial—relying on the most basic assumptions of correlation. A more careful analysis exposed two powerful facts: California test scores were far more likely to have dropped because of drastic cuts to educational funding and a significant influx of English language learners and (here is a very important point) even as whole language was the official reading policy of the state, few teachers were implementing whole language in their classrooms.

This last point cannot be emphasized enough: throughout the history of public education, because teaching is mostly a disempowered profession (primarily staffed by women), one recurring phenomenon is that teachers often shut their doors and teach—claiming their professional autonomy by resisting official policy.

November 2019 has brought us a similar and expected round of making outlandish and unsupported claims about NAEP data. With the trend downward in reading scores since 2017, this round is characterized by the sky-is-falling political histrionics and hollow fist pounding that NAEP scores have proven policies a success or a failure (depending on the agenda).

If we slip back in time just a couple decades, when the George W. Bush administration heralded the “Texas miracle” as a template for No Child Left Behind, we witnessed a transition from state-based educational accountability to federal accountability. But this moment in political history also raised the stakes on scientifically based educational policy and practice.

Specifically, the National Reading Panel was charged with identifying the highest quality research in effective reading programs and practices. (As a note, while the NRP touted its findings as scientific, many, including a member of the panel itself [1], have discredited the quality of the findings as well as accurately cautioning against political misuse of the findings to drive policy).

Here is where our trip back in history may sound familiar during this current season of NAEP hand wringing. While Secretary of Education (2005-2009), Margaret Spellings announced that a jump of 7 points in NAEP reading scores from 1999-2005 was proof No Child Left Behind was working. The problem, however, was in the details:

[W]hen then-Secretary Spellings announced that test scores were proving NCLB a success, Gerald Bracey and Stephen Krashen exposed one of two possible problems with the data. Spellings either did not understand basic statistics or was misleading for political gain. Krashen detailed the deception or ineptitude by showing that the gain Spellings noted did occur from 1999 to 2005, a change of seven points. But he also revealed that the scores rose as follows: 1999 = 212; 2000 = 213; 2002 = 219; 2003 = 218 ; 2005 = 219. The jump Spellings used to promote NCLB and Reading First occurred from 2000 to 2002, before the implementation of Reading First. Krashen notes even more problems with claiming success for NCLB and Reading First, including:

“Bracey (2006) also notes that it is very unlikely that many Reading First children were included in the NAEP assessments in 2004 (and even 2005). NAEP is given to nine year olds, but RF is directed at grade three and lower. Many RF programs did not begin until late in 2003; in fact, Bracey notes that the application package for RF was not available until April, 2002.”

Jump to 2019 NAEP data release to hear Secretary of Education Betsy DeVos shout that the sky is falling and public education needs more school choice—without a shred of scientific evidence making causal relationships of any kind among test data, educational policy, and classroom practice.

But an even better example has been unmasked by Gary Rubinstein who discredits Louisiana’s Chief of Change John White (praised by former SOE Arne Duncan) proclaiming his educational policy changes caused the state’s NAEP gain in math:

So while, yes, Louisiana’s 8th grade math NAEP in 2017 was 267 and their 8th grade math NAEP in 2019 was 272 which was a 5 point gain in that two year period and while that was the highest gain over that two year period for any state, if you go back instead to their scores from 2007, way before their reform effort happened, you will find that in the 12 year period from 2007 to 2019, Louisiana did not lead the nation in 8th grade NAEP gains.  In fact, Louisiana went DOWN from a scale score of 272.39 in 2007 to a scale score of 271.64 in 2019 on that test.  Compared to the rest of the country in that 12 year period.  This means that in that 12 year period, they are 33rd in ‘growth’ (is it even fair to call negative growth ‘growth’?).  The issue was that from 2007 to 2015, Louisiana ranked second to last on ‘growth’ in 8th grade math.  Failing to mention that relevant detail when bragging about your growth from 2017 to 2019 is very sneaky.

The media and public join right in with this political playbook that has persisted since the early 1980s: Claim that public education is failing, blame an ever-changing cause for that failure (low standards, public schools as monopolies, teacher quality, etc.), promote reform and change that includes “scientific evidence” and “research,” and then make unscientific claims of success (or yet more failure) based on simplistic correlation and while offering no credible or complex research to support those claims.

Here is the problem, then: What is the relationship among NAEP scores, educational policy, and classroom practice?

There are only a couple fair responses.

First, 2019 NAEP data replicate a historical fact of standardized testing in the US—the strongest and most persistent correlations to that data are with the socio-economic status of the students, their families, and the states. When students or average state data do not conform to that norm, these are outliers that may or may not provide evidence for replication or scaling up. However, you must consider the next point as well.

Second, as Rubinstein shows, the best way to draw causal relationship among NAEP data, educational policy, and classroom practices is to use longitudinal data; I would recommend at least 20 years (reaching back to NCLB), but thirty years would add in a larger section of the accountability era that began in the 1980s but was in wide application across almost all states by the 1990s.

The longitudinal data would next have to be aligned with the current educational policy in math and reading for each state correlated with each round of NAEP testing.

As Bracey and Krashen cautioned, that correlation would have to accurately align when the policy is implemented with enough time to claim that the change impacted the sample of students taking NAEP.

But that isn’t all, even as complex and overwhelming as this process demands.

We must address the lesson from the so-called whole language collapse in California by documenting whether or not classroom practice implemented state policy with some measurable level of fidelity.

This process is a herculean task, and no one has had the time to examine 2019 NAEP data in any credible way to make valid causal claims about the scores and the impact of educational policy and classroom practice.

What seems fair, however, to acknowledge is that there is no decade over the past 100 years when the media, public, and political leaders deemed test scores successful, regardless of the myriad of changes to policies and practices.

Over the history of public education, also, before and after the accountability era began, student achievement in the US has been mostly a reflection of socio-economic factors, and less about student effort, teacher quality, or any educational policies or practices.

If NAEP data mean anything, and I am prone to say they are much ado about nothing, we simply do not know what that is because we have chosen political rhetoric over the scientific process and research that could give us the answers.


[1] See:

Babes in the Woods: The Wanderings of the National Reading Panel, Joanne Yatvin

Did Reading First Work?, Stephen Krashen

My Experiences in Teaching Reading and Being a Member of the National Reading Panel, Joanne Yatvin

I Told You So! The Misinterpretation and Misuse of The National Reading Panel Report, Joanne Yatvin

The Enduring Influence of the National Reading Panel (and the “D” Word)