Google has released Gemma 4, its latest family of open AI models, built on the same underlying research as its proprietary Gemini 3 systems. The models come in four sizes — E2B, E4B, 26B Mixture of Experts (MoE), and 31B Dense — each targeting a different class of hardware, from smartphones and Raspberry Pi boards to developer workstations with NVIDIA GPUs. Since Gemma’s first release in February 2024, the models have been downloaded over 400 million times, with a developer community that has produced more than 100,000 variants — what Google calls the “Gemmaverse.”
What sets this release apart from its predecessors is a combination of expanded local capabilities and a major licensing change. Gemma 4 ships under the Apache 2.0 license — something developers had long requested, given that the previous custom Google licence placed restrictions that made commercial use complicated and unpredictable. Google DeepMind CEO Demis Hassabis described the models as “the best open models in the world for their respective sizes.” Hugging Face co-founder Clément Delangue called the Apache 2.0 switch “a huge milestone.”
The 31B Dense model ranks #3 among all open models globally on the Arena AI text leaderboard (ELO 1452), while the 26B MoE ranks #6 (ELO 1441) — both outperforming models many times their size. For NVIDIA RTX hardware and next-generation PC platforms, Gemma 4 brings genuinely usable frontier-class AI without a cloud subscription.
From Your Phone to Your Workstation —
Gemma 4 Runs Where You Are
Four model sizes. One licence that actually makes sense. An interactive look at what Gemma 4 is, what it can do, and which version is right for your hardware.
Key Facts at a Glance
All figures sourced directly from Google’s official launch announcements and the Arena AI leaderboard.
Which Gemma 4 Model Is Which?
Tap a model below to see its specs, target hardware, and key capabilities. Each size is built for a different use case — choose the one that fits your device.
Effective 2B activates only ~2 billion parameters during inference. It is designed to run completely offline with near-zero latency on phones, Raspberry Pi 5, and Jetson Nano hardware. Speed is its primary trait — E2B runs 3× faster than E4B on the same device.
- Active parameters~2B (effective)
- Context window128K tokens
- Memory (LiteRT 4-bit)<1.5 GB
- Hardware targetsAndroid, iOS, Raspberry Pi 5, Jetson Nano
- Audio input✓ Native ASR
- Speed vs E4B3× faster
Key Capabilities
- Runs fully offline on Android and iOS — no internet required
- Native audio input for speech recognition and understanding
- Processes images and video natively
- Up to 4× faster and 60% less battery vs previous Gemma on Android
- Forward-compatible with Gemini Nano 4 coming to flagship Android devices later in 2026
- Available now in Google AI Edge Gallery
Effective 4B provides more reasoning power than E2B while maintaining a mobile-first footprint. Built in close collaboration with Qualcomm Technologies and MediaTek, it targets mid-range to high-end smartphone NPUs.
- Active parameters~4B (effective)
- Context window128K tokens
- Hardware targetsAndroid, iOS, tablets, Jetson Nano
- NPU supportQualcomm & MediaTek optimised
- Audio input✓ Native ASR
- AccessAI Edge Gallery & AICore Developer Preview
Key Capabilities
- Native function calling for on-device agentic workflows
- Structured JSON output for reliable app integration
- OCR and document understanding on mobile
- Multilingual speech recognition across 140+ languages
- Base model for the upcoming Gemini Nano 4 on Android
- Prototype now via the AICore Developer Preview
The 26B Mixture of Experts model contains 26 billion total parameters but activates only 3.8 billion per inference pass. This makes it exceptionally fast — delivering performance close to the 31B Dense model at significantly lower compute cost. It ranks #6 among open models on Arena AI (ELO 1441).
- Total parameters26B (MoE)
- Active per inference3.8B
- Arena AI ranking#6 open (ELO 1441)
- Context window256K tokens
- Full precision targetSingle 80GB NVIDIA H100
- PrioritySpeed & throughput
Key Capabilities
- High tokens-per-second for real-time agent workflows
- Native function calling and tool use
- Offline code generation for local IDE integration
- Chart understanding and visual document analysis
- Quantised versions fit on consumer NVIDIA RTX GPUs
- Available via Ollama, llama.cpp, and Google AI Studio
The flagship variant uses all 31 billion parameters on every inference pass, maximising output quality. It ranks #3 among open models on Arena AI (ELO 1452), outperforming models 20× its size. Positioned by Google as the prime candidate for fine-tuning.
- Total parameters31B (Dense)
- Arena AI ranking#3 open (ELO 1452)
- Context window256K tokens
- GPQA Diamond score85.2%
- AIME 2026 score89.2% (no tool use)
- Full precision targetSingle 80GB NVIDIA H100
Key Capabilities
- State-of-the-art math and instruction-following
- Strong foundation for domain-specific fine-tuning
- Local high-quality code generation — fully offline
- Multi-step planning for complex agentic tasks
- Available in Google AI Studio today
- Runs quantised (Q4) on NVIDIA RTX 4090 (~20GB VRAM)
Arena AI Leaderboard: Open Models vs Parameter Count
Gemma 4’s 31B model competes with models carrying hundreds of billions of parameters. These are real ELO scores from the Arena AI text leaderboard as of April 1, 2026.
ELO scores per Google DeepMind’s official blog and Arena AI (April 1, 2026). Gemma 4 31B (31B total parameters) ranks #3, alongside Kimi-K2.5 (744B) and GLM-5 (1 trillion parameters). The 26B MoE activates only 3.8B parameters at inference time.
Core Capabilities Across All Four Models
Every Gemma 4 variant ships with these capabilities out of the box — no fine-tuning required for standard use cases.
Official Statements
Statements from official press releases and verified public posts.
“The best open models in the world for their respective sizes.”
— Demis Hassabis, CEO, Google DeepMind (Official Launch Statement, April 2026)“Incredible amount of intelligence per parameter.”
— Sundar Pichai, CEO, Google (Official Launch Statement, April 2026)“A huge milestone.”
— Clément Delangue, Co-founder, Hugging Face (on the Apache 2.0 licence switch, April 2026)Which Gemma 4 Should You Use?
Answer two quick questions to find the model that fits your situation. Based purely on Google’s official hardware and use-case guidance.
1. What kind of device will you run it on?
The Gemma Story So Far
Gemma 4 is the fourth major generation of Google’s open model family. Here’s the path from first release to today.
What the Apache 2.0 Switch Actually Changes
The previous custom Gemma licence was the single biggest reason enterprise teams avoided building on Gemma. Apache 2.0 removes that friction entirely.
Gemma 4 Runs Across This Entire Spectrum
From a Raspberry Pi drawing 5W to an H100 data-centre GPU — Gemma 4 has a variant designed for each tier.
Gemini Nano 4 ready
CPU & GPU support
Linux, macOS, Raspberry Pi
E2B & E4B
26B & 31B quantised
supercomputer
single GPU, bfloat16
WebGPU execution
Gemma 4 was covered here as four open AI models released by Google on April 2, 2026, built on the same research base as its proprietary Gemini 3 systems. The release included a shift from a restrictive custom licence to Apache 2.0, four model sizes (E2B, E4B, 26B MoE, 31B Dense), and capabilities including native function calling, multimodal input processing, and support for over 140 languages.
The 31B Dense model holds the #3 spot on the Arena AI open model leaderboard (ELO 1452), and the 26B MoE holds #6 (ELO 1441) — both as of the April 1, 2026 Arena snapshot cited in Google’s official launch announcement. The 26B MoE activates only 3.8 billion of its 26 billion parameters during inference, keeping hardware requirements manageable for developer workstations.
Google confirmed that Gemma 4 E2B and E4B serve as the base for Gemini Nano 4, expected to arrive on flagship Android devices later in 2026. The models were discussed in the context of NVIDIA’s RTX hardware ecosystem, next-generation PC platforms, and the broader trend of local AI processing covered across Apple and platform developments this year. Model weights are downloadable from Hugging Face, Kaggle, and Ollama.





