OpenAI’s latest AI model, GPT-4.5, has made headlines after outperforming humans in a rigorous new version of the Turing test, a long-standing benchmark for machine intelligence. In a recent study conducted by researchers at the University of California, San Diego, GPT-4.5 was mistaken for a human 73% of the time—more often than actual human participants.
The Turing test, proposed by British computer scientist Alan Turing in 1950, evaluates a machine’s ability to exhibit human-like intelligence by measuring whether human judges can distinguish it from a real person in conversation. The recent study involved nearly 300 student participants who were randomly assigned to play the role of either a judge or one of two “witnesses”—one of whom was a chatbot. Based on text exchanges alone, the judge had to determine which witness was human and which was not.
Remarkably, GPT-4.5 not only passed the test, but also outperformed the human participants in convincing judges of its humanity. The AI model adopted a human-like persona and was judged to be a real person more often than its human counterparts.
Other AI systems were also evaluated in the same test:
- Meta’s LLaMa 3.1 405b was judged to be human 56% of the time.
- ELIZA, an early chatbot from the 1960s, scored 23%.
- OpenAI’s previous model, GPT-4o, was judged to be human only 21% of the time.
Cameron Jones, a researcher at UC San Diego’s Language and Cognition Lab, commented on the findings, noting that “people were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (with the persona prompt),” and that “4.5 was even judged to be human significantly more often than actual humans!”
While the results are groundbreaking, not all scholars agree that passing the Turing test equates to true intelligence. Melanie Mitchell, a leading AI expert and professor at the Santa Fe Institute, argues that the Turing test is more reflective of human assumptions than a definitive measure of intelligence.
Writing in the journal Science, Mitchell noted, “The ability to sound fluent in natural language, like playing chess, is not conclusive proof of general intelligence.” She also pointed to a 2024 Stanford University study that evaluated GPT-4 on psychological and behavioral metrics, concluding that their version of the Turing test may not align with Turing’s original vision.
The debate around artificial general intelligence (AGI) continues, but GPT-4.5’s performance signals a major leap in human-like language capability—raising new questions about AI’s evolving role in society and communication.
Stay tuned to DC Brief for further updates on this story and other technology developments.