I failed the test because…

We have all taken tests. We have all likely sometimes done well and sometimes maybe not as well. The goal of any test is for the differences between the test scores of the individuals to be a result of differences in the thing you want to measure (e.g., knowledge, understandings) rather than a result of other things (e.g., room temperature, hunger, poverty).

There have been a few interesting studies released recently that suggest there are other things impacting student test scores. Not that it is surprising to anyone that these things exist, but in this era of high-stakes tests, especially those that are used to inform staffing and salary decisions, these differences are of particular concern.

First, according to this article in EducationWeek, several states and districts have analyzed student scores and found their data suggest a “mode effect” on the PARCC test. Some researchers have found a substantial difference in proficiency rates for students who take a paper-and-pencil version of the test when compared to those who take the test online. While one researcher suggested that tests given in such vastly different ways cannot be expected “to measure the same things,” (Briggs, cited in Herold, 2016, para. 6) this is simply not acceptable when teacher evaluations, and therefore personnel decisions, are based at least in part on the results of these assessments. As a teacher, the prospect that the tests may not be measuring what they are supposed to measure is a very scary one. The whole idea of using students’ test scores in teacher evaluations is based on the premise that the scores reflect what students have learned, at that more effective teaching will produce greater learning and, therefore, better scores. Whether or not this is appropriate is content for another blog entirely, but if something besides students’ knowledge, skills, and understandings is impacting their test scores, the test scores no longer have any relationship to the instruction in classrooms. At this point, researchers are not sure why students who took the test on the computer did not do as well as their paper-and-pencil counterparts, though some are reasonably suggesting that it may have to do with computer literacy or comfort with technology.

But what if there is a completely unrelated phenomenon at work? Another interesting, and potentially related piece of research surfaces. This research from University of Illinois (described in ChicagoInno) found that students who were able to view nature immediately prior to taking an assessment scored better on the assessment than students who had a view of man-made structures or were in classrooms with no windows. Maybe it is a stretch to relate this to the “mode effect,” but what if students who took the exams online were in a computer lab, where they are less likely to have windows than a traditional classroom? (Note: Our personal experiences suggest that computer labs tend to be in internal rooms that lack windows, primarily as a result of security concerns, though we have never collected nor seen data on this.) Could it be that the mode effect is actually a “nature effect”? What other variables that we haven’t even considered may be causing this phenomenon? Until we are certain that computerized tests are accurately measuring the knowledge, skills, and understandings that we are trying to assess, and not something else, we need to continue to administer the paper-and-pencil versions. There is obviously a lot more research to be done, and we suggest that it needs to be done before the era of paper-and-pencil testing ends completely.


Herold, B. (2016, February 4). Comparing Paper-Pencil and Computer Test Scores: 7 Key Research Studies. Education Week Digital Education Blog. Retrieved from