Reflections on What Pandemic-Related State Test Waiver Requests Suggest About the Priorities for the Use of Tests
University of Illinois at Urbana-Champaign
American Institutes for Research/CALDER
University of Washington/CEDR
CALDER Policy Brief No. 26-0721
Students’ performance on standardized tests is clearly predictive of their later outcomes (Goldhaber & Özek, 2019) but whether the costs of administering tests are justified by the value of the tests for improving students’ outcomes is controversial. This controversy fuels heated debates over federal testing requirements, such as those instituted under No Child Left Behind (NCLB) and continued under the Every Student Succeeds Act (ESSA). All ESSA-required tests were canceled in the 2019-20 school year due to COVID-19, and there was significant political debate about whether they ought to be required in 2020-21. Ultimately the U.S. Department of Education (USDOE) allowed for some testing flexibility in the form of ESSA waivers.
Below, we outline some of the common ways states might use tests to improve student outcomes and the implications of requested and approved waivers on these uses. In particular, we highlight features of waiver requests that are especially important if tests are to be used by specific actors (e.g., families) to benefit students in specific ways. We conclude with a discussion of how testing policy needs to be designed if statewide tests are going to both be useful and maintain political support.
Beliefs about how testing can improve education typically fall into at least one of three categories. First, tests can be diagnostic tools to target educational supports. This includes common student-level interventions, such as targeted tutoring or decisions about advanced coursework (e.g., Kraft, 2021; McEachin et al., 2020). Tests are also sometimes used to direct supports to schools, as with NCLB’s requirement that School Improvement Funds and technical assistance be directed to schools identified as needing support based on test scores (McClure, 2005). Second, test results can be useful for research and evaluation. Test scores are widely used as a primary outcome variable in work assessing the efficacy of educational interventions, practices, and policies, or to assess achievement gaps between groups of students.
Finally, tests, as required by both NCLB and ESSA, are a prominent part of school accountability systems (O’Keefe et al., 2021). The idea here is that rewards or sanctions connected to test results drive improvements to school or teacher practices. For example, statewide standardized tests may be used to evaluate teachers based on their contributions to students’ test scores and publicly reporting schools’ test performance may prompt families to put pressure on school administrators (e.g., by exercising school choice). Similarly, federal School Improvement Grants fund various types of whole school reforms that are targeted based on student test performance (USDOE, 2010).
More broadly, tests are used to assess where states (and the country as a whole) are in terms of achievement, and in serving student subgroups, to inform policy and practice. For instance, there is significant concern about the degree to which the COVID-19 pandemic has set back educational achievement, and much of the early empirical evidence on that is based on test achievement (e.g., Caprariello, 2021). Assessments have long been used to ascertain the degree to which the nation is addressing the significant gaps in achievement that exist between students based on race/ethnicity and/or poverty (Goldhaber et al., 2018). While not a direct use of test scores, they may be quite important in focusing attention and resources on the achievement gaps that exist in schools (Hess, 2011).
In this policy brief, we focus on some of the ways that tests might improve education. We emphasize might because, as we describe below, there is considerable debate and controversy about the usefulness of standardized tests, and test utility depends very much on how such policies are designed, who is supposed to use the results (e.g., teachers, families, or policymakers), and what the results are supposed to be used for. While not exhaustive, in Table 1 we illustrate many of the most common combinations of purpose and user imagined by testing advocates. Specific uses of testing may not fit unambiguously into a single cell of Table 1, but the table offers a framework for thinking about how standardized tests might be used, and about how ESSA waivers might limit such uses.
Table 1: Potential Uses of Test Results to Improve Student Outcomes
Importantly, while Table 1 outlines potential uses and users of test results, it is not clear that this potential is being fully realized or that the benefits exceed the downsides of test administration. For example, as we argue below, state test results are often not disaggregated in ways that identify individual student skills or needs such that they provide actionable and timely information. And testing critics argue that test-based accountability leads to narrowing the curriculum and to excessive time spent on test preparation (e.g., Koretz, 2009, 2017).
While federal testing requirements are popular in the abstract (Henderson et al., 2019), this support appears fragile. For example, 41% of respondents in 2020 endorsed the view that there is “too much emphasis on achievement testing” in public schools, up from 37% in 2008 and 20% in 1997 (PDK Poll, 2020). And net support for testing requirements falls by 20 points when respondents are told that test administration takes on average eight hours per year (Henderson et al., 2019). State testing is popular, but that support comes with reservations and may be malleable. The pandemic and ESSA testing waivers introduce additional uncertainty about the use of state tests, which could have important implications for the viability of testing policy.
Concerns about testing in a pandemic prompted USDOE to allow states to request waivers for some ESSA testing requirements. Specifically, on February 22, 2021, USDOE provided formal guidance to states requesting temporary ESSA flexibility in three areas (Rosenblum, 2021). First, states could request waivers from ESSA’s accountability and school identification requirements, such as requiring the use of test data to differentiate schools. Second, some public reporting requirements could be waived, especially those related to test scores. Third, USDOE offered flexibility around test administration and recommended considering practices such as remote administration and lengthier testing windows.
Because this waiver request process allows states to ask to jettison components of federal test-related policies, it may shed light on which of the three aforementioned purposes of testing, for different users of testing data, are most salient to states and how tests are likely to be used going forward.
Table 2 summarizes the initial formal requests that states made for waivers from ESSA’s test-related requirements; USDOE approvals are in bold. In our discussion, we focus on how the waiver requests would, if approved, influence the ways test results can be used. But first, it is important to emphasize that waiver requests were not as extensive as permissible. In particular, we point to what is absent from Table 2: only 12 states are represented in the table, indicating that the vast majority did not request any waivers beyond the bare minimum of what USDOE guaranteed requesting states. That the requests for waivers were not more widespread and ambitious may indicate that many policymakers have faith in the value of their annual test regimes for at least some purposes.
Table 2: Features and Approval Status of States’ Initial Waiver Requestsa
Click table to enlarge
Notes. [a] Empty cells indicate no waiver requested, or the waiver requested no change relative to the baseline. Bold cells indicate that USDOE approved request. Waiver requests may include features and other waivers not described here (e.g., extended testing windows; waiver of the 1% cap on participation in the alternate test). This table reflects state’s initial waiver requests, some states (like Oregon) have submitted a second request after receiving feedback from USDOE. As of 6/02/2021, thirty six states & the Bureau of Indian Education (AZ, CA, CO, CT, DE, DC, FL, GA, IL, IN, KY, MA, MD, MI, MN, MS, MT, NC, ND, NE, NJ, NM, NV, OH, OK, OR, PA, SC, SD, TX, UT, VA, VT, WA, WI, WV) received “Accountability and School Level Identification Waivers”, which waive school differentiation and the 95% of students tested requirements, among other things. An example waiver can be found here.
[b] In California’s conversation with USDOE, they described that they plan to administer the state summative tests except in districts where it is “not viable” to do so because of the pandemic. USDOE approved this but added, “Please note that viability refers to the ability to administer the statewide summative test given a district’s specific circumstances in the context of the pandemic. It does not provide an opportunity for States or school districts to choose to administer local tests in place of the statewide summative test.”
Several requested changes, often approved, provide flexibility to the timing of tests, test length, grade and subjects tested, and the test instrument itself. While all these approved changes provide districts with flexibility, they sacrifice to some degree statewide or cross-year comparability of test results. Most prominently, nine state education agencies asked for districts to have flexibility over the specific tests they administer. While this would not make cross- district comparisons impossible, they certainly would be more challenging, similar to comparing achievement across states utilizing different tests (Kuhfeld et al., 2019).
The use of local assessments would have no immediate impact on accountability, both because accountability provisions are waived this year and because most states, including those requesting flexibility around local choice of exam, use test score levels to assess schools rather than year-to-year growth measures. But growth-based measures tend to be favored by researchers as a means of identifying school quality (Polikoff, 2017), and the local assessment option may hamper any momentum toward growth-based accountability in states with this waiver (e.g., Fensterwald, 2021).
The use of local assessments would also hamper policymakers wishing to, for instance, figure out which districts are handling COVID-19 recovery efforts well or poorly. Similarly, a lack of comparability across districts could hamper the identification of students for interventions or specialized (e.g., gifted and talented) programs.
Testing flexibility has enormous implications for educational research. Test results from 2021 will necessarily be difficult to use as a baseline (Ho, 2021), but changes to instruments will further limit comparisons to tests used in the 2021-22 school year. And if a state does not administer a test in a grade or subject (e.g., cancelling science tests or assessing math and ELA only in alternating grade levels, as requested by some states), no baseline would be available in those cases. Moreover, states did not administer tests in 2020, and more than one “gap” year of test score data makes modeling the student growth – often essential for effective research and evaluation – highly impracticable (Fazlul et al., 2021).
Even where common tests are administered statewide, other flexibility granted to states may make results difficult to compare across jurisdictions or years. In particular, that the requirement 95% of eligible students participate in testing has been waived may result in variation in testing across student subgroups, making comparisons across years and between groups less reliable. Consequently, uncertainty about student achievement and how it was impacted by the pandemic or various interventions is likely to echo across education research, policy, and practice for years.
Diagnostic uses of test results (e.g., for remediation) are less likely where states have delayed their testing, as USDOE encouraged, which is in turn likely to delay the receipt of results. In one extreme case, New Jersey received approval to move the state test into the fall. This would prevent districts from using the results as a diagnostic tool to inform summer programming, though locally administered tests may still be available. Similarly, results released in the fall can be of no help to families in terms of selecting appropriate academic interventions for children over the summer.
Waivers will make it more challenging to get parents detailed, actionable information.
States are also supposed to “provide to each individual parent…information on the level of achievement and academic growth of the student…on each of the State academic assessments”. Even prior to the pandemic, these individual reports to families looked quite different from state-to-state and often did not provide much information beyond whether students are below or above state standards. This does not mean that parents do not receive test-based information. They may, for instance, receive data from locally administered tests, and/or state test results could trigger conversations between teachers and parents in which information is conveyed. Nevertheless, waivers will make it more challenging to get parents detailed, actionable information. The challenges arise not only because delayed testing windows push reporting further out, but also because shorter or alternative tests likely limit the degree to which the assessments can be used to let parents know where their students stand in terms of specific skills.
Excluding requests for waivers of common statewide test requirements, of the 40 total specific waiver requests we outline in the cells of Table 2, the Biden administration approved 28 (70%) at least in part (shown in bold). It is notable, then, that there appears to be one requirement that USDOE was particularly unlikely to waive: namely, that schools administer a common assessment statewide. Despite many requests by states to rely on LEA-chosen assessments, only the District of Columbia (where most LEAs are charter school networks) was able to secure such a waiver. Why would the federal government draw a particularly hard line here, while granting considerable flexibility in other areas and acknowledging – both implicitly and explicitly – that state tests from 2021 are likely to be of limited use (e.g., for accountability)? We speculate that there was concern that even temporarily waiving statewide tests would give momentum to those advocating for the elimination of testing all together. That is, USDOE (and perhaps states that did not request that common assessments be waived) may be less interested in what happens with testing this year than worried about a slippery slope toward increasingly lax testing requirements.
Our discussion of states’ waiver requests is not intended to be exhaustive, but rather to reflect on some features of testing policy and what waiver requests might tell us about how test results might be used. In light of that discussion, we conclude with two more general recommendations for policymakers.
First, policymakers and advocates of standardized testing should be more explicit in linking specific features of testing policy to specific theories of action about who will be using the test results and what they will be using the results for. Vague assertions that “we cannot fix what we do not measure” may be rhetorically useful, but they provide little rationale for any specific testing regime. Testing policies that are not motivated by specific theories of action for how their results will be used are likely to generate results that are underutilized if they are used at all. A potentially useful example of an unusually clear theory of action can be found in Washington state’s waiver request. Although the plan was not approved in full, the waiver request explicitly outlined a statewide system of classroom, school, district-based and state assessments, each with clear purposes (Washington Office of Superintendent of Public Instruction, 2021).
Policymakers and advocates of standardized testing should be more explicit in linking specific features of testing policy to specific theories of action about who will be using the test results and what they will be using the results for.
Second, we have a warning for those who – like us – believe that standardized tests can play a useful role in improving educational outcomes: Maintaining political support for such tests – and especially for statewide standardized tests – probably depends on demonstrating their diagnostic value to both educators and families, as these uses tend to be viewed most favorably (PDK Poll, 2020). Accountability policies are controversial, and research and evaluation are too far removed from the lives of most students, families, and educators to inspire deep political support for standardized tests. Yet the diagnostic uses that might rally public support by providing concrete and immediate benefits to students and schools have often been neglected even by staunch advocates of standardized testing. This is evidenced in part by how poorly state testing policy is typically designed for diagnostic purposes. For example, even prior to the pandemic state test results have often taken too long to receive and provided too little accessible and useful information about the knowledge and skills students need to develop to allow teachers or families to direct targeted support to children (Goodman & Hambleton, 2004; Marsh et al., 2006; Mulvenon et al., 2005). It is therefore no surprise that many might be resistant to administering tests during a pandemic that has already drastically disrupted instructional time.
Maintaining political support for such tests – and especially for statewide standardized tests – probably depends on demonstrating their diagnostic value to both educators and families.
Sketching out a comprehensive plan for diagnostic state tests is beyond the scope of this essay, but English Language Proficiency (ELP) testing may provide some useful lessons. As shown in Table 2, even states requesting waivers from various ESSA requirements for testing in math, ELA, and science typically requested at most minor waivers from ESSA’s ELP requirements. On the one hand, this may reflect a belief that such requests would not be granted, or the absence of a strong political constituency opposed to ELP testing in the way that teachers’ unions and many families oppose subject testing. On the other hand, the lack of a strong anti- ELP testing constituency is arguably a sign of political support for ELP testing in its own right.
Another possibility, then, is that ELP testing is genuinely more popular than other forms of statewide standardized testing. That could be because ELP testing has institutionalized popular diagnostic uses in a way that other types of statewide standardized testing have not. In our experience, stakeholders generally believe that ELP results are at least somewhat credible signals of important student skills and can be used to classify students as needing specific kinds of productive interventions (viz., English learner services). In other words, because it can provide timely, credible information that specific educators are expected to use in specific ways, ELP testing policy may have developed robust political and institutional support in a way that other aspects of state and federal testing policy have not.
We encourage policymakers to think carefully, explicitly, and publicly about how they have tailored their standardized testing policies to achieve various diagnostic, research, and accountability objectives.
The COVID-19 pandemic has heightened the salience of pre-existing concerns that administering statewide standardized tests is not worth it, even as it has heightened concerns that many students need substantial interventions to address lost learning opportunities and that gaps have widened. We encourage policymakers to think carefully, explicitly, and publicly about how they have tailored their standardized testing policies to achieve various diagnostic, research, and accountability objectives. This will help to ensure that standardized tests have benefits for more schools and students and will bolster fragile political support for statewide tests.
2 For a more expansive discussion of the various uses of tests, see Ho (2014).
3 State test results, for instance, often take several months to make it into the hands of educators or families, impeding the use of testing to help individual students (Marsh et al., 2006).
4 Of course, the decision to request waivers reflects complex political dynamics. Additionally, we have information only on observed waiver requests, which might reflect informal (and unobserved) negotiations with USDOE and suppositions by policymakers about what is likely to be approved.
5 In a few cases states modified requests over time, such as Colorado’s initial request to eliminate science testing, which was later modified to administer science tests in grades 8 and 11.
6 Moreover, only 36 states requested the separate but guaranteed (for approval) “Accountability and School-Level Identification” waivers, which eliminate requirements that 95% of eligible students be tested and that results be used to differentiate schools.
7 See Section 1112(e)(1)(B) of: https://www.k12.wa.us/sites/default/files/public/assessment/statetesting/pubdocs/WA_Smarter_HS_samplescorereport.pdf
8 For more about the content and format of reports that go to families, see Goodman and Hambelton (2004).