Blog

But can it be measured?

The component of ESSA that requires schools to include one non-academic measure in assessments of school performance seems like a step in the right direction, at least in terms of recognizing that school success is about more than just academics. Some schools are already moving in this direction, which is raising some concerns among researchers about the quality of the existing measures of non-academic skills and their appropriateness for measuring school quality.

Measurement is really an interesting field. For those of us who make our living measuring things, we are rather particular about how, where, for what, and with whom measures are used. When conducting evaluations, educators often tell us to use “reliable instruments.” Instruments, however, are not inherently reliable or unreliable. Consider the SAT. Intended to be administered to high school students, it’s a measure of “the reading, writing and mathematics skills that students learn in school, and that are critical for success in college” (College Board, SAT, para. 1). It is generally a well-respected instrument, a reliable measure of what high school students have learned, and a relatively good predictor of college success. If I choose to administer the instrument to sixth graders, however, it is no longer a good measure of either student learning or college success. If I want to measure high school students’ keyboarding skills, it’s completely useless. It is a reliable and valid measure of some particular things with some particular group of people. Those two caveats—of particular things with particular people—are a key understanding of what reliability and validity are.

The purpose of an instrument, therefore, is a key factor in terms of its reliability and validity. When it comes to measuring social and emotional learning, the instruments that exist were created to provide information about individual students. They were not intended to assess the effectiveness or quality of schools. As Angela Duckworth (who created some of the instruments and has used them extensively in her own research) is quoted as saying in this article from the New York Times, “all measures suck, and they all suck in their own way.”

While researchers and evaluators support using multiple measures to assess effectiveness, quality, and success, in this case the measures may simply not be up to the task yet. We will need much additional research and instrument development before we have reliable and valid measures of these constructs that we are confident using as a component of school effectiveness ratings. Even with years of additional work, there is a possibility that this is simply not a construct that is appropriate to use in this context for this purpose.

Camille Farrington indicates in the New York Times article that, “In education, we have a great track record of finding the wrong way to do stuff.” It is easy to do, as there are far more wrong ways than right ways in most cases. For many years under NCLB, we have likely been measuring schools wrong by overemphasizing the importance of standardized test scores in determining school quality. Many have questioned whether these tests, developed to measure student learning, are really adequate measures of teacher and school quality. Now we have a chance to expand the definition of quality to include new constructs. We need to take time to both determine what these constructs should be and how to measure them such that we are confident in the data that results. This time, let’s slow down and try to get it right.

 

College Board (2016). SAT. Available at http://research.collegeboard.org/programs/sat