OpenAI Study: Models Produce 75% Wrong Answers When Trained to Guess Rather Than Express Uncertainty

Language models like ChatGPT often confidently state incorrect facts – a problem known as “hallucination.” This issue frustrates users who rely on AI for accurate information, but new research from OpenAI sheds light on why these errors persist and how they might be fixed.

The False Birthday Problem

When researchers asked a popular AI about Adam Kalai’s birthday (one of the paper’s authors), it confidently gave three different incorrect dates on separate occasions. Similarly, when asked about his PhD dissertation title, multiple AI systems produced completely fabricated – yet convincing – answers.

These aren’t random glitches. According to OpenAI’s new research paper, hallucinations stem from two key factors: statistical limitations in training and misaligned evaluation methods.

The Test-Taking Problem

The core issue resembles how students approach exams. When uncertain about an answer on a multiple-choice test, students often guess rather than leave a blank, because guessing offers a chance at points while blanks guarantee zero.

Similarly, AI systems are evaluated using benchmarks that reward accuracy (percentage of correct answers) but don’t penalize wrong answers more than abstentions. This creates a powerful incentive for models to guess rather than admit uncertainty.

As OpenAI explains: “If a language model is asked for someone’s birthday but doesn’t know, and it guesses ‘September 10,’ it has a 1-in-365 chance of being right. Saying ‘I don’t know’ guarantees zero points.”

The Numbers Tell the Story

This incentive problem appears clearly in evaluation data. OpenAI compared two models on the SimpleQA test:

The newer GPT-5 model abstained 52% of the time (saying “I don’t know”), gave correct answers 22% of the time, and wrong answers 26% of the time
An older model rarely abstained (1%), had slightly better accuracy (24%), but produced wrong answers 75% of the time

While they scored similarly on accuracy, the hallucination rates differed dramatically. Yet most leaderboards focus solely on accuracy metrics, encouraging development of models that guess rather than express uncertainty.

The False Birthday Problem

The Test-Taking Problem

The Numbers Tell the Story

Similar Posts

The Statistical Origins

The Solution: Change the Scoring

Breaking Common Misconceptions

Leave a comment Cancel reply

Business, Hardware, News, Technology

Micron kills Crucial brand after 29 years as $8B AI memory demand swallows all capacity

Business, News, Technology

Venmo outage hits 71,000 users: Server failures block payments as “something went wrong” error spreads nationwide

News, Technology

Samsung Galaxy Z TriFold launches Dec 12: 3.9mm thin, $2,449, 10-inch display beats Huawei by 0.1mm

News, Technology

Google RCS archival lets employers capture work phone texts for 3-6 years compliance

AI, News, Technology

Perplexity AI memory stores preferences across models with encryption and auto incognito disable

AI, Business, News, Technology

ChatGPT shopping blocked by Amazon: 800M users lose access as $56B ad business takes priority

OpenAI Study: Models Produce 75% Wrong Answers When Trained to Guess Rather Than Express Uncertainty

The False Birthday Problem

The Test-Taking Problem

The Numbers Tell the Story

Similar Posts

The Statistical Origins

The Solution: Change the Scoring

Breaking Common Misconceptions

Share this:

Leave a comment Cancel reply

most recent

Business, Hardware, News, Technology

Micron kills Crucial brand after 29 years as $8B AI memory demand swallows all capacity

Business, News, Technology

Venmo outage hits 71,000 users: Server failures block payments as “something went wrong” error spreads nationwide

News, Technology

Samsung Galaxy Z TriFold launches Dec 12: 3.9mm thin, $2,449, 10-inch display beats Huawei by 0.1mm

News, Technology

Google RCS archival lets employers capture work phone texts for 3-6 years compliance

AI, News, Technology

Perplexity AI memory stores preferences across models with encryption and auto incognito disable

AI, Business, News, Technology

ChatGPT shopping blocked by Amazon: 800M users lose access as $56B ad business takes priority