OpenAI’s GPT-4.5 Surpasses Turing Test in Landmark Study

April 6, 2025

292

OpenAI’s latest AI model, GPT-4.5, has made headlines after outperforming humans in a rigorous new version of the Turing test, a long-standing benchmark for machine intelligence. In a recent study conducted by researchers at the University of California, San Diego, GPT-4.5 was mistaken for a human 73% of the time—more often than actual human participants.

The Turing test, proposed by British computer scientist Alan Turing in 1950, evaluates a machine’s ability to exhibit human-like intelligence by measuring whether human judges can distinguish it from a real person in conversation. The recent study involved nearly 300 student participants who were randomly assigned to play the role of either a judge or one of two “witnesses”—one of whom was a chatbot. Based on text exchanges alone, the judge had to determine which witness was human and which was not.

Remarkably, GPT-4.5 not only passed the test, but also outperformed the human participants in convincing judges of its humanity. The AI model adopted a human-like persona and was judged to be a real person more often than its human counterparts.

Other AI systems were also evaluated in the same test:

Meta’s LLaMa 3.1 405b was judged to be human 56% of the time.
ELIZA, an early chatbot from the 1960s, scored 23%.
OpenAI’s previous model, GPT-4o, was judged to be human only 21% of the time.

Cameron Jones, a researcher at UC San Diego’s Language and Cognition Lab, commented on the findings, noting that “people were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (with the persona prompt),” and that “4.5 was even judged to be human significantly more often than actual humans!”

While the results are groundbreaking, not all scholars agree that passing the Turing test equates to true intelligence. Melanie Mitchell, a leading AI expert and professor at the Santa Fe Institute, argues that the Turing test is more reflective of human assumptions than a definitive measure of intelligence.

Writing in the journal Science, Mitchell noted, “The ability to sound fluent in natural language, like playing chess, is not conclusive proof of general intelligence.” She also pointed to a 2024 Stanford University study that evaluated GPT-4 on psychological and behavioral metrics, concluding that their version of the Turing test may not align with Turing’s original vision.

The debate around artificial general intelligence (AGI) continues, but GPT-4.5’s performance signals a major leap in human-like language capability—raising new questions about AI’s evolving role in society and communication.

Stay tuned to DC Brief for further updates on this story and other technology developments.

OpenAI’s GPT-4.5 Surpasses Turing Test in Landmark Study

Data Centers Reshape Small Towns Amid Rapid Expansion

IRS Implements AI Agents Following Major Workforce Reduction

U.S. Tech Braces for Memory Chip Shortage as AI Demand Soars

Most Popular

Data Centers Reshape Small Towns Amid Rapid Expansion

Musk and U.S. Officials Denounce EU’s $140 Million Fine Against X

Florida Teacher Used Google Docs as Digital Grooming Tool

Unanswered White Truck Questions Haunt Murdaugh Family Housekeeper

EDITOR PICKS

Hemp Beverage Industry Faces Growing Scrutiny as U.S. Lawmakers Push Back

Gold Prices Under Pressure as Dollar Strengthens and Tariff Fears Grow

Hedge Fund Risk Grows in U.S. Markets

POPULAR POSTS

Future of American Manufacturing Faces Uncertainty as Trump Supports Steel Merger

Gold Prices Surge Amid Economic Uncertainty in the U.S.

VanEck Moves to Launch First BNB ETF in the U.S., Eyes Staking Option

POPULAR CATEGORY

Follow Us