Google Gemma 4 Ranks #3 Globally, Beats Models 20× Its Size, Now Free Under Apache 2.0

Google has released Gemma 4, its latest family of open AI models, built on the same underlying research as its proprietary Gemini 3 systems. The models come in four sizes — E2B, E4B, 26B Mixture of Experts (MoE), and 31B Dense — each targeting a different class of hardware, from smartphones and Raspberry Pi boards to developer workstations with NVIDIA GPUs. Since Gemma’s first release in February 2024, the models have been downloaded over 400 million times, with a developer community that has produced more than 100,000 variants — what Google calls the “Gemmaverse.”

What sets this release apart from its predecessors is a combination of expanded local capabilities and a major licensing change. Gemma 4 ships under the Apache 2.0 license — something developers had long requested, given that the previous custom Google licence placed restrictions that made commercial use complicated and unpredictable. Google DeepMind CEO Demis Hassabis described the models as “the best open models in the world for their respective sizes.” Hugging Face co-founder Clément Delangue called the Apache 2.0 switch “a huge milestone.”

The 31B Dense model ranks #3 among all open models globally on the Arena AI text leaderboard (ELO 1452), while the 26B MoE ranks #6 (ELO 1441) — both outperforming models many times their size. For NVIDIA RTX hardware and next-generation PC platforms, Gemma 4 brings genuinely usable frontier-class AI without a cloud subscription.

// Open Models · Local AI · Apache 2.0

From Your Phone to Your Workstation —
Gemma 4 Runs Where You Are

Four model sizes. One licence that actually makes sense. An interactive look at what Gemma 4 is, what it can do, and which version is right for your hardware.

Apache 2.0 License On-Device AI Agentic Workflows 4 Model Sizes 400M+ Downloads 140+ Languages

Gemma 4 by the Numbers

Key Facts at a Glance

All figures sourced directly from Google’s official launch announcements and the Arena AI leaderboard.

400M+

Total Gemma downloads since February 2024

100K+

Community-built variants in the Gemmaverse

256K

Token context window (26B & 31B models)

#3 / #6

31B & 26B Arena AI open-model rankings (ELO 1452 / 1441)

4×

Faster on Android vs previous Gemma edge models

140+

Languages natively supported across all models

Interactive Model Explorer

Which Gemma 4 Model Is Which?

Tap a model below to see its specs, target hardware, and key capabilities. Each size is built for a different use case — choose the one that fits your device.

Edge / Mobile

Gemma 4 E2B

Effective 2B activates only ~2 billion parameters during inference. It is designed to run completely offline with near-zero latency on phones, Raspberry Pi 5, and Jetson Nano hardware. Speed is its primary trait — E2B runs 3× faster than E4B on the same device.

Active parameters~2B (effective)
Context window128K tokens
Memory (LiteRT 4-bit)<1.5 GB
Hardware targetsAndroid, iOS, Raspberry Pi 5, Jetson Nano
Audio input✓ Native ASR
Speed vs E4B3× faster

Key Capabilities

Runs fully offline on Android and iOS — no internet required
Native audio input for speech recognition and understanding
Processes images and video natively
Up to 4× faster and 60% less battery vs previous Gemma on Android
Forward-compatible with Gemini Nano 4 coming to flagship Android devices later in 2026
Available now in Google AI Edge Gallery

Edge / Mobile

Gemma 4 E4B

Effective 4B provides more reasoning power than E2B while maintaining a mobile-first footprint. Built in close collaboration with Qualcomm Technologies and MediaTek, it targets mid-range to high-end smartphone NPUs.

Active parameters~4B (effective)
Context window128K tokens
Hardware targetsAndroid, iOS, tablets, Jetson Nano
NPU supportQualcomm & MediaTek optimised
Audio input✓ Native ASR
AccessAI Edge Gallery & AICore Developer Preview

Key Capabilities

Native function calling for on-device agentic workflows
Structured JSON output for reliable app integration
OCR and document understanding on mobile
Multilingual speech recognition across 140+ languages
Base model for the upcoming Gemini Nano 4 on Android
Prototype now via the AICore Developer Preview

Workstation / Server

Gemma 4 26B MoE

The 26B Mixture of Experts model contains 26 billion total parameters but activates only 3.8 billion per inference pass. This makes it exceptionally fast — delivering performance close to the 31B Dense model at significantly lower compute cost. It ranks #6 among open models on Arena AI (ELO 1441).

Total parameters26B (MoE)
Active per inference3.8B
Arena AI ranking#6 open (ELO 1441)
Context window256K tokens
Full precision targetSingle 80GB NVIDIA H100
PrioritySpeed & throughput

Key Capabilities

High tokens-per-second for real-time agent workflows
Native function calling and tool use
Offline code generation for local IDE integration
Chart understanding and visual document analysis
Quantised versions fit on consumer NVIDIA RTX GPUs
Available via Ollama, llama.cpp, and Google AI Studio

Workstation / Server

Gemma 4 31B Dense

The flagship variant uses all 31 billion parameters on every inference pass, maximising output quality. It ranks #3 among open models on Arena AI (ELO 1452), outperforming models 20× its size. Positioned by Google as the prime candidate for fine-tuning.

Total parameters31B (Dense)
Arena AI ranking#3 open (ELO 1452)
Context window256K tokens
GPQA Diamond score85.2%
AIME 2026 score89.2% (no tool use)
Full precision targetSingle 80GB NVIDIA H100

Key Capabilities

State-of-the-art math and instruction-following
Strong foundation for domain-specific fine-tuning
Local high-quality code generation — fully offline
Multi-step planning for complex agentic tasks
Available in Google AI Studio today
Runs quantised (Q4) on NVIDIA RTX 4090 (~20GB VRAM)

Performance Context

Arena AI Leaderboard: Open Models vs Parameter Count

Gemma 4’s 31B model competes with models carrying hundreds of billions of parameters. These are real ELO scores from the Arena AI text leaderboard as of April 1, 2026.

ELO scores per Google DeepMind’s official blog and Arena AI (April 1, 2026). Gemma 4 31B (31B total parameters) ranks #3, alongside Kimi-K2.5 (744B) and GLM-5 (1 trillion parameters). The 26B MoE activates only 3.8B parameters at inference time.

What It Can Do

Core Capabilities Across All Four Models

Every Gemma 4 variant ships with these capabilities out of the box — no fine-tuning required for standard use cases.

🧠

Multi-Step Reasoning

Complex problem-solving with multi-step planning. The 31B scores 89.2% on AIME 2026 and 85.2% on GPQA Diamond — both without tool use.

🤖

Agentic Tool Use

Native function calling and structured JSON output let Gemma 4 connect to external APIs and automate multi-step tasks autonomously.

💻

Offline Code Generation

High-quality code generated entirely on local hardware. Supported in Android Studio’s Agent Mode. No internet connection needed.

🖼️

Vision & Video

All four models natively process images and video at variable resolutions. Reliable at OCR, chart reading, handwriting, and document analysis.

🎙️

Audio Input (E2B / E4B)

E2B and E4B support native audio input for speech recognition and audio understanding — running entirely on-device.

🌍

140+ Languages

Pretrained on over 140 languages. All models handle multilingual text and conversations without additional configuration.

On Record

Official Statements

Statements from official press releases and verified public posts.

“The best open models in the world for their respective sizes.”

— Demis Hassabis, CEO, Google DeepMind (Official Launch Statement, April 2026)

“Incredible amount of intelligence per parameter.”

— Sundar Pichai, CEO, Google (Official Launch Statement, April 2026)

“A huge milestone.”

— Clément Delangue, Co-founder, Hugging Face (on the Apache 2.0 licence switch, April 2026)

Interactive Tool

Which Gemma 4 Should You Use?

Answer two quick questions to find the model that fits your situation. Based purely on Google’s official hardware and use-case guidance.

1. What kind of device will you run it on?

How We Got Here

The Gemma Story So Far

Gemma 4 is the fourth major generation of Google’s open model family. Here’s the path from first release to today.

🌱

FEBRUARY 2024

Gemma 1 Launches

Google’s first open-weight model family launches in 2B and 7B sizes under a custom proprietary licence. The Gemmaverse begins.

📈

JUNE 27, 2024

Gemma 2 Released

Gemma 2 expands the family with improved performance. The community reaches 100 million downloads and 60,000+ variants by this point.

🚀

MARCH 12, 2025

Gemma 3 Arrives

Gemma 3 adds multimodal support (text + images), a 128K context window, and 140+ language support across four sizes (1B–27B). Custom licence still in place.

🔓

APRIL 2, 2026

Gemma 4 — Apache 2.0

Four new models (E2B, E4B, 26B MoE, 31B Dense) launch under the Apache 2.0 licence. Total downloads now exceed 400 million, with 100,000+ community variants. The 31B ranks #3 globally among open models on Arena AI.

Licensing

What the Apache 2.0 Switch Actually Changes

The previous custom Gemma licence was the single biggest reason enterprise teams avoided building on Gemma. Apache 2.0 removes that friction entirely.

⛔ Old Gemma Licence (Gemma 1–3)

✗Prohibited-use policy — Google could update terms unilaterally at any time

✗Developers had to pass Google’s rules down to all derived projects

✗Could apply to models trained on synthetic data produced by Gemma

✗Commercial use required navigating case-by-case restrictions

✅ Apache 2.0 (Gemma 4)

✓Commercially permissive — build, sell, and ship products freely

✓Developers retain full control of their data and deployment environment

✓Widely understood by the developer community — no new terms to interpret

✓Google cannot change the licence conditions going forward

Supported Hardware

Gemma 4 Runs Across This Entire Spectrum

From a Raspberry Pi drawing 5W to an H100 data-centre GPU — Gemma 4 has a variant designed for each tier.

📱

Android Phones

E2B & E4B via AICore
Gemini Nano 4 ready

🍎

iOS Devices

E2B & E4B via LiteRT-LM
CPU & GPU support

🍓

Raspberry Pi 5

E2B via litert-lm CLI
Linux, macOS, Raspberry Pi

🔧

NVIDIA Jetson

Jetson Orin Nano
E2B & E4B

🖥️

Consumer GPUs

RTX 4090 etc.
26B & 31B quantised

⚡

DGX Spark

NVIDIA personal AI
supercomputer

🏭

H100 (80GB)

31B & 26B unquantised
single GPU, bfloat16

🌐

Browser

transformers.js
WebGPU execution

Gemma 4 was covered here as four open AI models released by Google on April 2, 2026, built on the same research base as its proprietary Gemini 3 systems. The release included a shift from a restrictive custom licence to Apache 2.0, four model sizes (E2B, E4B, 26B MoE, 31B Dense), and capabilities including native function calling, multimodal input processing, and support for over 140 languages.

The 31B Dense model holds the #3 spot on the Arena AI open model leaderboard (ELO 1452), and the 26B MoE holds #6 (ELO 1441) — both as of the April 1, 2026 Arena snapshot cited in Google’s official launch announcement. The 26B MoE activates only 3.8 billion of its 26 billion parameters during inference, keeping hardware requirements manageable for developer workstations.

Google confirmed that Gemma 4 E2B and E4B serve as the base for Gemini Nano 4, expected to arrive on flagship Android devices later in 2026. The models were discussed in the context of NVIDIA’s RTX hardware ecosystem, next-generation PC platforms, and the broader trend of local AI processing covered across Apple and platform developments this year. Model weights are downloadable from Hugging Face, Kaggle, and Ollama.

→ Official Gemma 4 Blog → Android Developers Blog → NVIDIA RTX Coverage → Surface Spring 2026 → Apple 50th Anniversary