【chines sex video】
By OpenAI's own testing,chines sex video its newest reasoning models, o3 and o4-mini, hallucinate significantly higher than o1.
First reported by TechCrunch, OpenAI's system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3's hallucination rate is 33 percent, and o4-mini's hallucination rate is 48 percent — almost half of the time. By comparison, o1's hallucination rate is 16 percent, meaning o3 hallucinated about twice as often.
SEE ALSO: All the AI news of the week: ChatGPT debuts o3 and o4-mini, Gemini talks to dolphinsThe system card noted how o3 "tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims." But OpenAI doesn't know the underlying cause, simply saying, "More research is needed to understand the cause of this result."
You May Also Like
OpenAI's reasoning models are billed as more accurate than its non-reasoning models like GPT-4o and GPT-4.5 because they use more computation to "spend more time thinking before they respond," as described in the o1 announcement. Rather than largely relying on stochastic methods to provide an answer, the o-series models are trained to "refine their thinking process, try different strategies, and recognize their mistakes."
However, the system card for GPT-4.5, which was released in February, shows a 19 percent hallucination rate on the PersonQA evaluation. The same card also compares it to GPT-4o, which had a 30 percent hallucination rate.
In a statement to Mashable, an OpenAI spokesperson said, “Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability.”
Evaluation benchmarks are tricky. They can be subjective, especially if developed in-house, and research has found flaws in their datasets and even how they evaluate models.
Plus, some rely on different benchmarks and methods to test accuracy and hallucinations. HuggingFace's hallucination benchmark evaluates models on the "occurrence of hallucinations in generated summaries" from around 1,000 public documents and found much lower hallucination rates across the board for major models on the market than OpenAI's evaluations. GPT-4o scored 1.5 percent, GPT-4.5 preview 1.2 percent, and o3-mini-high with reasoning scored 0.8 percent. It's worth noting o3 and o4-mini weren't included in the current leaderboard.
That's all to say; even industry standard benchmarks make it difficult to assess hallucination rates.
Related Stories
- Is OpenAI building a social network for ChatGPT's viral image generator?
- We tried the ChatGPT 'reverse location search' trend, and it's scary
- The latest ChatGPT trend? People are using it to turn their pets into humans.
Then there's the added complexity that models tend to be more accurate when tapping into web search to source their answers. But in order to use ChatGPT search, OpenAI shares data with third-party search providers, and Enterprise customers using OpenAI models internally might not be willing to expose their prompts to that.
Regardless, if OpenAI is saying their brand-new o3 and o4-mini models hallucinate higher than their non-reasoning models, that might be a problem for its users.
UPDATE: Apr. 21, 2025, 1:16 p.m. EDT This story has been updated with a statement from OpenAI.
Topics ChatGPT OpenAI
Search
Categories
Latest Posts
Super Bowl LIX livestream: Watch Eagles vs Chiefs on Tubi
2025-06-27 00:28Swiping is here to turn the YouTube app into ultimate time
2025-06-27 00:10Louis Theroux casually compares Donald Trump to Brexit
2025-06-26 23:20Netflix's 'Sex Education' nails a crucial aspect of sex positivity
2025-06-26 23:11Apple is advertising on Elon Musk's X again
2025-06-26 23:09Popular Posts
Singing man completely shuts down a dissenter at Dublin pro
2025-06-26 23:43The 'Spider
2025-06-26 23:18The music industry is going after YouTube
2025-06-26 22:50Featured Posts
Amazon Kindle Paperwhite Kids: $139.99 at Amazon
2025-06-27 00:36GoFundMe will refund donations to campaign for Trump's border wall
2025-06-26 23:51China is cracking down on the country’s Twitter users
2025-06-26 23:41Louis Theroux casually compares Donald Trump to Brexit
2025-06-26 23:38Popular Articles
How China is radically reinventing urban architecture to go green
2025-06-27 00:18Andrea Savage of truTV's 'I'm Sorry' is my personal hero
2025-06-26 23:44Newsletter
Subscribe to our newsletter for the latest updates.
Comments (486)
Impression Information Network
Wordle today: The answer and hints for February 13, 2025
2025-06-27 01:06Inspiration Information Network
Harry Styles' first solo magazine covers are here, and baby, they're perfect
2025-06-27 00:54Style Information Network
Terrible partier tried to attack Justin Bieber in a German nightclub
2025-06-27 00:12Neon Information Network
Toto's 'Africa' is now playing on an endless loop in an African desert
2025-06-26 23:58Creative Information Network
They met on Tumblr, and their relationship outlasted their accounts
2025-06-26 23:17