Google has released Gemini 2.5 Flash, a new AI model that lets users decide how much “thinking” their AI should do – and only pay for what they actually need. This mid-April 2025 launch marks a shift in AI pricing and performance.
The model introduces what Google calls a “thinking budget,” allowing developers to set limits on how deeply the AI reasons through problems. When you need simple answers, you can turn thinking off completely. For complex problems, you can dial up the reasoning power.
“We know cost and latency matter for a number of developer use cases, and so we want to offer developers the flexibility to adapt the amount of thinking the model does, depending on their needs,” said Tulsee Doshi, Product Director for Gemini Models at Google DeepMind.
The Cost of Thinking
The price difference between thinking and non-thinking is significant. While input costs remain at $0.15 per million tokens, output costs jump from $0.60 per million tokens without thinking to $3.50 with thinking – nearly six times more expensive.
This pricing model reflects how much more computing power is needed when the AI analyzes complex problems step-by-step rather than providing quick responses. Developers can set a thinking budget anywhere from 0 to 24,576 tokens.
What makes this system clever is that the model doesn’t always use its full budget. It automatically judges how much thinking a task requires. A simple question like “How many provinces does Canada have?” needs minimal reasoning, while engineering calculations trigger deeper thinking processes.
Similar Posts
How Good Is It?
Despite being designed for speed and cost savings, Gemini 2.5 Flash performs well on difficult tasks. It scored 12.1% on Humanity’s Last Exam (HLE), a tough benchmark that tests advanced reasoning. This score beat several competitor models like Claude 3.7 Sonnet (8.9%) and DeepSeek R1 (8.6%), though it fell short of OpenAI’s recently launched o4-mini (14.3%).
The model can process up to one million tokens of context, meaning it can handle extremely long documents or conversations. It works with text, images, video, and audio inputs.
What Can It Do?
Gemini 2.5 Flash is built for tasks that need a balance of smarts and speed: summarizing documents, answering questions in chat systems, pulling specific information from texts, and creating captions for images and videos.
The model’s thinking capabilities shine when handling multi-step problems like complex math or analyzing detailed research questions. When these abilities aren’t needed, users can turn thinking off to save money and speed up response times.
Availability and Industry Impact
Currently available in preview through the Gemini API in Google AI Studio and Vertex AI, the model is also accessible in the Gemini app as “2.5 Flash (Experimental).” Google has not announced when it will be generally available for full production use.
For businesses, this release offers a way to control AI costs while still accessing advanced capabilities when needed. The approach reflects a maturing AI market where companies need to carefully manage expenses as they integrate AI into daily operations.
This release comes during a busy week for Google, which also rolled out Veo 2 video generation capabilities and announced free Gemini Advanced access for U.S. college students until spring 2026.