OpenAI launched its newest artificial intelligence models yesterday—o3 and o4-mini—advancing the company’s push into what it calls “reasoning” technology. These models represent a direct upgrade to their predecessors (o1 and o3-mini) while offering a substantial cost reduction for similar or improved performance.
What Sets These Models Apart
The main selling point: these are the first OpenAI reasoning models that can use every ChatGPT tool simultaneously. While previous versions were good at thinking through problems step-by-step, these new models can now:
- Search the web for current information
- Run Python code to analyze data or create visualizations
- Interpret images (even low-quality ones)
- Generate images when needed
“These are the smartest models we’ve released to date,” OpenAI stated in its announcement. “For the first time, our reasoning models can agentically use and combine every tool within ChatGPT.”
Cost vs. Performance: A Notable Shift
In an unusual move for tech upgrades, OpenAI has cut prices while improving capabilities:
- o3: $10 per million input tokens/$40 per million output tokens (33% cheaper than o1)
- o4-mini: $1.10 per million input tokens/$4.40 per million output tokens (same price as o3-mini)
The company claims these models deliver better results per dollar than their predecessors. For instance, on the 2025 AIME math competition, o4-mini scored 92.7% accuracy—outperforming o3-mini while costing the same to run.
Visual Reasoning: A Key Advancement
Perhaps the most practical innovation is what OpenAI calls “thinking with images.” These models don’t just see an image—they incorporate it into their reasoning process. Users can upload:
- Whiteboard photos
- Hand-drawn sketches
- Textbook diagrams
- Low-quality images
The system can then manipulate these visuals (zoom, rotate) while thinking through problems, allowing for more natural visual problem-solving.
Similar Posts
Early Expert Reactions
Some early users have expressed enthusiasm about the capabilities. Dr. Derya Unutmaz, an immunologist, wrote on X that o3 appears “at or near genius level” and added: “It’s generating complex incredibly insightful and based scientific hypotheses on demand! When I throw challenging clinical or medical questions at o3, its responses sound like they’re coming directly from a top subspecialist physicians.”
Wharton professor Ethan Mollick compared o3 to Google’s Gemini 2.5 Pro on Bluesky, writing: “After using them both, I think that Gemini 2.5 & o3 are in a similar sort of range… Each has its own quirks & you will likely prefer one to another, but there is a gap between them & other models.”
During the announcement livestream, OpenAI President Greg Brockman claimed: “These are the first models where top scientists tell us they produce legitimately good and useful novel ideas.”
Cautions and Limitations
Independent testing by research lab Transluce found that a pre-release version of o3 sometimes produced fabricated information, such as claiming to run code locally or providing made-up hardware specifications.
“It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities,” Transluce noted on X.
OpenAI acknowledges that for certain benchmark tests, it implemented domain blocks and monitoring to prevent the models from simply finding answers online.
Accessing the New Models
- ChatGPT Plus, Pro, and Team users: Available now (replacing o1, o3-mini, and o3-mini-high)
- Enterprise and Edu users: Access in one week
- Free users: Can try o4-mini by selecting ‘Think’ before submitting queries
- Developers: Available through Chat Completions API and Responses API
OpenAI CEO Sam Altman tweeted that “we expect to release o3-pro to the pro tier in a few weeks.”
Codex CLI: A Side Project
Alongside these models, OpenAI released Codex CLI, an open-source terminal tool that connects its AI models to users’ local computers and code. The company launched a $1 million grant program offering API credits for projects using this tool.
Codex CLI bears similarities to Anthropic’s Claude Code, released in February, suggesting a competitive push into coding assistance tools.
What Comes Next?
These releases continue OpenAI’s often confusing naming convention. As tech writer Timothy B. Lee noted on X: “It’s an amazing branding decision to have a model called GPT-4o and another one called o4.”
Altman acknowledged the criticism, writing: “How about we fix our model naming by this summer and everyone gets a few more months to make fun of us (which we very much deserve) until then?”
Looking ahead, OpenAI indicated these might be its last standalone reasoning models before GPT-5, which aims to unify traditional GPT models with reasoning capabilities in a single system.