Automated test grading has moved way past Scantron bubble sheets


Even before the pandemic one way technology has been creeping into students. Lives is through grading and we're not just talking about those multiple choice bubble sheets that have been around for decades. The educational testing service which creates statewide assessments for K. through twelve students along with higher Ed tests like gre has been using artificial intelligence to grade essays since. Nineteen ninety nine but can a I really tell good writing from bad andreas or Anya is vice president of assessment and learning technology development for ets. He says, systems are trained to look for things like style grammar and how arguments are built assist them is trained by to extract a lot of features of good and bad writing, and so what that system does, it's it aggregate all the data from an essay after processing it, and then it produces some kind of professional score. The scores of the computer are being compared to a human reiter, and then if there's a big discrepancy, usually a third raider comes in to resolve the discrepancy. So it's actually sort of a check on the human reviewer. It is the way we use it. It is a check on the human reviewer deal. See this ever the I ever being good enough to replace teachers in their sort of regular grading that they do, which I know is one of their many teachers least favourite activities. Yeah. So I really believe in joining forces of computers and humans and so having. Systems that help teachers, for example, spots like, Hey, a lot of your students are struggling with this topic may be next lesson. You want to take a little bit more time you know expanding on that or maybe you know the degrading rights, especially the the more low low key grading with with fairly straight fair for answers at the lower levels. That's something that a computer can do really well, very reliably and so helping a teacher so that teacher can spend more time on personalizing their instruction and and giving individual attention I. Think it's a great way to go about this. How often do you see the I catching examples of poor writing that that human has missed and in that second expert having to come in? It depends by exam. So for a practice exam, it may be different than for Gre. But we have pretty strict rules surrounded where More than one and a half point difference is already more than we WANNA tolerate and you're talking about maybe six point skill, and so you know it depends on the exam it can be five percent can be ten percents but I think the important point is that whenever there's some discrepancy we're going to check into it. We're never letting something go without the check. We've seen stories of students. You know kind of gaming, the system just writing a bunch of keywords that they know the system is GonNa be looking for and getting a perfect score. What's the usefulness of that kind of grading if it can easily be gamed? If a system is that easily game -able, it's obviously not a very good system and it needs to be revisited. Now you have to take into account what the purpose was. So if the purpose of a system is to help a student in a low six environment learn or to help inform a teacher denise of that system are obviously very different than if you make life decisions about admitting someone or other you know very impactful decisions based on it, and so it depends on how you use it. Clearly. There are some school systems that are using a I. In this way. What role do you see the companies that make the technology having in preventing it from be being used in a way that you think is not actually that helpful We we have done a bunch of presentations over the past couple years to kind of set up Some key questions that we believe teachers should ask is the data that is being used in these systems appropriate for my students. Right is are these models related to what my students need cannot intervene in the system or these kind of questions that? Teachers need to ask and I think it's really important that as we move further and further in this automated world that we really tool people an arm people with with those questions and being able to make good judgments for whether these tools are good for the learning that happens in their classrooms or not. Andreas you've been with the company for a really long time working on this technology. Do you see a point when when your work is done, it's all it all works as well as human grading. I don't think that point will AFRICOM and not because he systems aren't aren't progressing really rapidly because they're getting better and better. But the reason why I don't think that's ever happened is because as soon as we get these systems really well done to do what we know. Now, we as a society want way more things and we are always wanting more things than we wanted before and so our creativity and our desire for new things will always out base what we can model at any point in time Andreas or anya is with eds.

Coming up next