These AI Detectors Are Getting Out of Hand

Warning: This post is over 365 days old. The information may be out of date.

Recently, I noticed a ton of posts on Reddit where students were accused of academic dishonesty and plagiarism through the use of “AI” tools. With the start being this post:

And after that, the reports just started to flood in:

The possibilities here are:

  • All of those students are lying, and they did really use AI
  • They aren’t lying, and the AI detection software screwed up

Seems obvious enough. Let’s figure out how AI detection software works. To do that, we need a base example. Let’s go with Turnitin’s AI detector.

Why Turnitin?

Let’s get the bias out of the way: I have a deeply-seated hatred toward Turnitin.

Back in high school, Turnitin accused me of plagiarizing an essay that I wrote on my own, and gave one of my classmates a 100% plagiarism score because they submitted the essay on a separate, testing Turnitin environment to make sure they didn’t accidentally plagiarize anything.

But that’s before the AI tools started to flood the market. Turnitin’s pre-AI tool “checked” for “plagiarism” by essentially doing a compare between all the text submitted to their platform. If two copies matched and the match rate was above a certain threshold it would be flagged for review.

Obviously, that wouldn’t work anymore with GPT-based cheating. Depending on the prompt and the model and the platform used, the output could vary drastically, even between runs! Turnitin had to come up with a new solution to avoid becoming irrelevant overnight with their already-flawed, outdated method of comparing text content.


Turnitin’s “AI writing detection”

Turnitin capitalized on the scare of “AI-based cheating” by creating an entire page dedicated to showing off their new “AI detection” solution.

If you ever go to that page accidentally, you’ll see a whole lot of marketing speak, corporate mumbo-jumbo (“AI Innovation Lab”), but nothing of substance. How does the damn detection work, Turnitin?!

That nugget of info is buried in their FAQ page, and even the information listed there is vague and unhelpful. I mean, read this:

  1. How does it work? (…blah blah technical details…) The segments are run against our AI detection model and we give each sentence a score between 0 and 1 to determine whether it is written by a human or by AI. (…more technical stuff…) Currently, Turnitin’s AI writing detection model is trained to detect content from the GPT-3 and GPT-3.5 language models, which includes ChatGPT. Because the writing characteristics of GPT-4 are consistent with earlier model versions, our detector is able to detect content from GPT-4 (ChatGPT Plus) most of the time. We are actively working on expanding our model to enable us to better detect content from other AI language models.

Okay, so it appears Turnitin checks if the submission was generated by an AI by… feeding it into another AI model and asking it to predict whether or not it was written by a human or an AI. And look at the last sentence – they need to make a new model for each AI language model that crops up on the market.

So Turnitin went from a objective measure (how much of the text was copy-pasted) to a completely subjective measure made out of a AI black-box. Obviously, this new system is much more trustworthy. (/s)

And they expect instructors to trust them on this and completely ruin students’ lives based on this prediction score!

Well, actually…

Even Turnitin itself acknowledges that this method of detection is flawed. In their blog post titled “Understanding false positives within our AI writing detection capabilities”, Turnitin writes the following:

  1. Know before you go—make sure you consider the possibility of a false positive upfront and have a plan for what your process and approach will be for determining the outcome. Even better, communicate that to students so that you have a shared set of expectations.
  2. Assume positive intent—in this space of so much that is new and unknown, give students the strong benefit of the doubt. If the evidence is unclear, assume students will act with integrity.
  3. Be open and honest—it is important to acknowledge that there may be false positives upfront, so both the instructor and the student should be prepared to have an open and honest dialogue. If you don’t acknowledge that a false positive may occur, it will lead to a far more defensive and confrontational interaction that could ultimately damage relationships with students.

So why are these horror stories occurring across different colleges? Well, this disclaimer is tucked away into a blog post, that’s why. Now, I’m not sure if they put a giant warning banner on top of their AI prediction score, but I’m pretty confident in saying that even if they did it won’t read anything like “This score may be completely inaccurate because we’re basically asking an AI to perform a Turing test on fellow robots”, because if they actually wrote that they’d lose the trust of educators and go out of business.

Therefore, some educators assume the AI detection model of Turnitin is battle-tested and ready for general use, not taking into account that the prediction score itself has a compounded margin of error.

But let’s stop bashing on Turnitin for a moment. Because I want to talk about “AI detection” in general.

AI detection, models, “hallucination” and datasets

All this “AI detection” sucks because the very idea of “detecting AI” sucks.

Look at it this way: we’re warned against trusting output from AI models because they may “hallucinate” and spit out completely incorrect information. AI detectors are built off of AI models. So how come we’re asked to trust AI detectors?

Furthermore, the power of these GPT-based tools come from the giant dataset that they use to give us information. Stuff like writing code, creating a recipe, etc. are easy for AI models to do, since they only have to regurgitate what they learned from the datasets. But with AI detection, the models are asked to operate on a completely new input (generated from the aforementioned GPT models) and determine something that was never included in their dataset – whether or not a given piece of text was AI-generated. The very idea of AI-detection only existed as Turing tests on the Internet, and it never could’ve accounted for all the GPT models flooding the landscape right this moment.

Case in point: one of the “AI detectors” named GPTZero actually marked the Declaration of Independence as AI-generated. I never knew the founding fathers were GPT models!

So, what now?

Instructors around the world need to adapt, just like they did when the Internet first came out and allowed dishonest students to copy-paste whatever they found online.

Instead of the AI detection prediction score becoming a measure, it should only be used as a reference. If the student performs badly all the time, then it can be used as part of the evidence set to show that the student is committing academic dishonesty. If the student is a model student and one of their essays is flagged as AI-generated, then educators will know that there was a false positive result.

For this to happen, all these companies should become much more transparent about their detection capabilities, and emphasize that their tool should only be a single part of an arsenal used to determine academic honesty, not a be-all-end-all solution. This may be hard to do, since these tools are sold to educational organizations and distributed in bulk to educators. Without proper documentation and training, educators may have a false understanding of the tool and use it in ways that it wasn’t intended to be used.

And honestly, writing out that last sentence reminded me of the pre-AI Turnitin. And GPT chatbots. And virtually every other piece of software or tool in existence. If you don’t fully understand the limitations and characteristics of what you’re using, then you probably shouldn’t blindly trust the results when everything’s said and done.

I hope this blog post was useful! If you or any other student you know was misidentified as a robot, try sharing this blog post with your instructor and let them know the dangers of relying on AI detectors.

Update (2023-07-27)

OpenAI recently pulled their AI detector because it had poor accuracy.

When the most popular GPT-based chatbot maker shuts down their detector, it really begs the question: how do other companies manage to make detectors that surpass that of OpenAI? (Hint: they can’t, and don’t.)