hat if students could sleep in on the day of the final examination and let a computer program take the test for them? Let’s imagine a hypothetical adaptive software program — call it NeverTest — capable of taking examinations and performing consistent with student ability. Were such a system implemented, teachers could grade the tests without making nervous students sit through them.
Yes, it’s a thought experiment — but bear with me. The student would work with NeverTest on assignments during the term, and when it came time to measure performance, the program, not the student, would take the examination, making the mistakes its algorithm predicted the student would make.
All of the accuracy without any of the anxiety.
The notion is hardly outrageous. Increasing evidence suggests that the use of learning analytics during the term can accurately predict final grades. Surely it’s a small step to have the software take the exam and make the predicted mistakes. All that’s needed is a large enough data set; the more assignments, the more accurate the predictions.
Does the proposal make you uneasy? Two possible reasons come to mind: First, the students might cheat on the assignments the algorithm uses to predict exam errors; and, second, the software might err in its predictions.
Let’s take these concerns in order.
Sadly, college students do cheat. A lot. What has come to be called “contract cheating” — in which a third party completes the student’s work — is on the upswing around the world. In some studies, more than 15 percent of students admit to having cheated at least once. A nontrivial number of cheaters probably lie on surveys about cheating, so the proportion is likely higher still.
Elite universities are as vulnerable as anyplace else. In 2015, the provost of Stanford issued an open letter about “an unusually high number of troubling allegations of academic dishonesty,” including one “that may involve as many as 20 percent of the students in one large introductory course.” In 2012, Harvard investigated charges that some 125 students in a single course — half the enrollment — had colluded on assignments.
One might therefore imagine that in a course using my hypothetical NeverTest software, a large number of students would break the rules. If the point of the adaptive software is to evaluate student strengths and weaknesses throughout the term in preparation for the final examination, a student could simply hire someone smarter or harder-working or less anxious, then let NeverTest evaluate her strengths and weaknesses instead.
But this concern matters only if net cheating would increase under NeverTest. I’d suggest, to the contrary, that net cheating might actually fall. It’s one thing to pay a substitute to sit for a final examination, a single concrete event; it’s something else to pay a substitute to complete all the other assignments during the term.
Besides, there are plenty of ways to ensure that the right person is sitting in front of the computer, including some that students experience as intrusive. That experience could be avoided by making NeverTest optional. Only those who preferred not to sit for the final examination would use the adaptive software during the term. But even if the early users were mainly students with unusual levels of examination anxiety, it’s easy to imagine that the software might swiftly come to be the default.
The larger concern, surely, is that the NeverTest algorithm could be wrong. It might predict inaccurately the errors the students would have made had they sat for the final examination themselves. Every teacher has known students who struggle throughout the term only to blossom unexpectedly at the end. NeverTest wouldn’t capture the result of the hard work and determination that carries these students successfully through their difficulties.
But I wonder whether the harm suffered by the student who outperforms on final exams might be balanced by the harm avoided by the student who underperforms. A preference for rewarding the student who peaks late over the student who peaks early might represent nothing more than status quo bias.
In any case, the incidence of both errors might be reduced by helping my hypothetical software make better evaluations of student ability. In computer science courses, performance on early assignments turns out to be a significant predictor of the final grade. It’s easy to imagine this result replicated in other STEM courses, and perhaps in economics or foreign languages — all fields where constant homework yields constant feedback.
I suspect that students outperform predictions most frequently in traditional lecture-and-examination courses in humanities and social sciences (and of course law). Thus the late-blossoming student might be an artifact of having few data points — or perhaps taking into account the wrong ones. Classroom performance, for example, might have little to do with measurable ability. (A 2018 study of medical students found no correlation between classroom attendance and examination grade. ) Moreover, evaluation of classroom question-and-answer is notoriously subject to racial and gender biases.
For the NeverTest software to gather sufficient data, the course would need a sufficiently large number of assignments — and the questions would have to be as objective as possible. Maybe this is asking too much of those who teach outside of a handful of subjects. But the influence of anxiety on examination performance is a long-established finding in social science.
All of this, as I said, is simply a thought experiment. But I wouldn’t be surprised if it’s also the future. So, to my friends in Silicon Valley: I’ll be sitting by the phone.