I tweeted the other day about some quizzes I had taken that yielded results that were…unexpected. Resulting conversations ran the gamut from the relative merits of requiring leaders of technical teams to be technical folk to simply commiserating about the “impostor syndrome”-triggering nature of failing even a badly done test—and I fully intend on writing about some of these—but that’s not this post. This post is about a different concept that the assessments made me think about; the incredible difficulty of making good testing materials and one strategy for making better ones.
The fact that teaching people is a tricky and difficult thing should come as no surprise, but what I found only when I had been doing it for a couple of years was that the hard part isn’t the actual teaching itself—that part is actually fairly simple if you know the topic really well and can communicate with any degree of clarity. The hard part is honoring your initial intent with all of the materials, but especially exams and the like.
I started my undergrad—well, the final time I started my undergrad—a working software developer fully a decade into my career. By the time I sat down to take one of my first exams in a class purported to be an introduction to programming logic, I had been writing programs for double that time. That test bothered me to such a degree that it haunted my thoughts for years to come as I continued my career both as an instructor and a developer. Why was it such a terrible exam? How could a fantastic teacher create an such a bad evaluation tool?
More than half of the test was comprised of questions best described by the following template:
What is the definition of {word}?
- Obviously wrong answer
- Answer that looks right save for one very small error
- Answer that could be correct, but is clearly wrong
- Correct answer
Of the remaining questions, most only deviated from this formula by not specifically requesting a definition. My favorite example from this particular type of question (presented to you by virtue of the fact that I’m a digital hoarder with decades of bullshit on my hard drive):
An array is:
- A collection of values stored in one variable referenced by index 1 to n
- A collection of values stored in one variable reference by index 0 to n-1
- A single beam of light
- A list of similar but unrelated items
There is a host of problems with this question, but for someone who spent some time programming in Pascal and Fortran both in school and professionally in the years prior to this exam, this question was really galling.1
The crux of the problem is, there wasn’t even any point in the latter half of the text of the “correct” answer. It’s clearly very clumsily tacked on as a counter to the “trick” answer. Getting this question “wrong” by answering (1) doesn’t indicate that you don’t understand the material—at best it indicates that you were unclear on a nuance. More importantly, answering this question “correctly” doesn’t even indicate a fundamental understanding of what an array is—as evidenced by the lackluster results of the first practical exercises when we used arrays.
The instructor took their eyes off the prize and forgot what their intent was in giving the exam in the first place. So many tests make this exact mistake. The purpose of this exam was stated in print at the top of the first page:
The purpose of this exam is to demonstrate a basic understanding of how [to] use the foundational components of a computer program…
In most applications, simply knowing the definition of a word—especially to a pedantic degree—does not afford one any more ability to be proficient in a thing that not knowing the definition.2 Wouldn’t the following question have better suited the purpose?
For the following questions, use an array defined in C as follows:
char letters[5] = {“h”, “e”, “l”, “l”, “o”};
What index would you use in C to request the letter ‘e’?
Rather than the definition, if you correctly answer this question I now know if you know how to USE this foundational component of a computer program. It’s still an imperfect question, but already it is more aligned with my exam’s intent. But wait! What if you need to ask a vocabulary question in order to satisfy the intent of the exam? This isn’t uncommon, but the vocabulary question should be phrased in a way as to satisfy understanding over recitation.
In the bad array question above, the language used in the potential answers was directly from the teaching material. This is often done for a very rational, well intended reason: to AVOID tricking students by changes in wording. The problem is that it doesn’t really prove that the student understands the concept which—again—was the stated intent of the exam. It provides evidence that they can recite the verbiage that you provided already, not that they know what it means. Validating understanding of vocabulary in a way that is quickly and easily gradable (read: non-short answer) is tricky, but there are a number of strategies that have been shown to have success.
Most commonly, multiple choice (as above) but with the actual correct answer being a derivation of the textbook answer and the other answers being derivations of other vocabulary items in the material being taught. This can be done in the single format (again, as above), or it can be done in a many-to-many format (as in “draw a line between the word and its definition”). Asking the respondent to select synonyms and/or antonyms can also be valuable in some cases.
Strategies notwithstanding (and if you’d like more in depth information on strategies like these, I highly recommend How Learning Works by Abrose, Bridges, et al), all of this is secondary to resolutely ensuring that you choose mechanisms for evaluation that adhere to the reason that you chose to evaluate the student to begin with.
It is challenging. Even with this knowledge, and even after taking numerous courses on pedagogy, I still struggled with making my testing materials valuable to students. Some time after I had taken this fraught exam, I found myself giving exams that were in no way better than those I am describing here. During one frustrating exam creation session, I got up, walked to the dry erase board in my office, and wrote the following3 in huge letters directly in my line of sight:
I want to know that students that pass this exam will know exactly how to use the things I test them on here in practical ways, and that students that get questions wrong will know exactly what they need to study to be able to use those things.
I want no students that know the answer to a question to get it wrong.
I don’t want my exam to be clever, I want my students to be clever.
Simply, I wrote my objective statement where I could see it. I made my intent…well…intentional, and I did so in a manner that increased the likelihood that it would impact my actual behavior. The positive direction that this pushed my teaching and my students was palpable. The test that I was writing at the time immediately felt more “right” to me than any I had created before. Each subsequent quiz and exam moved closer and closer to the ideal I had in mind because each time I looked at my material I found new ways that it wasn’t honoring my intention. As my skill as an instructor improved, so, too, did my ability to find ways to meet those objectives.
The results weren’t simply gut feel, though. Exam scores improved, but more importantly so did the results of all project work and labs. My sample was small, but my pass rate went up by a small-but-measurable percentage. Better still, the students that came out of my classes started being lauded as particularly “well prepared” for higher-level courses to follow. In short, I hadn’t made the tests easier—I had made them more effective.
Years after this epiphany (and I use that term very loosely, here), I had the pleasure of getting positive feedback from a student at the end of my course. She was switching careers from a decidedly non-technical field to that of a developer, and among the things she said one that stood out to me was the observation that her test anxiety and impostor syndrome did not manifest so intensely on my exams; that, as she opined, the exams “didn’t try to make her feel stupid.” Her software career has surpassed my own at this point and I delight in the idea that this change in course might have played some small part in that.
In my experience, there’s no magic bullet that creates great exams. It is only through conscious, mindful attention to the goals of the exercise that you can hope to end up with the desired result. Conscious, mindful attention…and a ridiculous amount of practice. As an aside to the armchair quarterbacks out there mean-spiritedly snarking about “shitty tests” I extend this invitation: create an exam about something you know very well and see how difficult it is to make something of which you can be proud. I think you’ll be surprised.
1 The “correct” answer was 2, but Pascal, Fortran, and numerous other languages start their indices at 1 rather than 0. Further, most modern languages (even at that time) allow for non-numeric indices, making the question even more grossly inaccurate.
2 There are exceptions, obviously—knowing what “flammable” or “caustic” means could be pretty important in a lab setting, for example.
3 In reality, it was probably something very similar, it subtly mutated over time, but this is pretty close to what I wrote.