Google Gemma 4 Ranks #3 Globally, Beats Models 20× Its Size, Now Free Under Apache 2.0

GigaNectar Team

Google Gemma 4 open AI model family dark branded graphic showing the Gemma 4 logo on a dark background released April 2026

Google has released Gemma 4, its latest family of open AI models, built on the same underlying research as its proprietary Gemini 3 systems. The models come in four sizes — E2B, E4B, 26B Mixture of Experts (MoE), and 31B Dense — each targeting a different class of hardware, from smartphones and Raspberry Pi boards to developer workstations with NVIDIA GPUs. Since Gemma’s first release in February 2024, the models have been downloaded over 400 million times, with a developer community that has produced more than 100,000 variants — what Google calls the “Gemmaverse.”

What sets this release apart from its predecessors is a combination of expanded local capabilities and a major licensing change. Gemma 4 ships under the Apache 2.0 license — something developers had long requested, given that the previous custom Google licence placed restrictions that made commercial use complicated and unpredictable. Google DeepMind CEO Demis Hassabis described the models as “the best open models in the world for their respective sizes.” Hugging Face co-founder Clément Delangue called the Apache 2.0 switch “a huge milestone.”

The 31B Dense model ranks #3 among all open models globally on the Arena AI text leaderboard (ELO 1452), while the 26B MoE ranks #6 (ELO 1441) — both outperforming models many times their size. For NVIDIA RTX hardware and next-generation PC platforms, Gemma 4 brings genuinely usable frontier-class AI without a cloud subscription.

// Open Models · Local AI · Apache 2.0

From Your Phone to Your Workstation —
Gemma 4 Runs Where You Are

Four model sizes. One licence that actually makes sense. An interactive look at what Gemma 4 is, what it can do, and which version is right for your hardware.

Apache 2.0 License On-Device AI Agentic Workflows 4 Model Sizes 400M+ Downloads 140+ Languages
Gemma 4 by the Numbers

Key Facts at a Glance

All figures sourced directly from Google’s official launch announcements and the Arena AI leaderboard.

400M+
Total Gemma downloads since February 2024
100K+
Community-built variants in the Gemmaverse
256K
Token context window (26B & 31B models)
#3 / #6
31B & 26B Arena AI open-model rankings (ELO 1452 / 1441)
Faster on Android vs previous Gemma edge models
140+
Languages natively supported across all models
Interactive Model Explorer

Which Gemma 4 Model Is Which?

Tap a model below to see its specs, target hardware, and key capabilities. Each size is built for a different use case — choose the one that fits your device.

Edge / Mobile
Gemma 4 E2B

Effective 2B activates only ~2 billion parameters during inference. It is designed to run completely offline with near-zero latency on phones, Raspberry Pi 5, and Jetson Nano hardware. Speed is its primary trait — E2B runs 3× faster than E4B on the same device.

  • Active parameters~2B (effective)
  • Context window128K tokens
  • Memory (LiteRT 4-bit)<1.5 GB
  • Hardware targetsAndroid, iOS, Raspberry Pi 5, Jetson Nano
  • Audio input✓ Native ASR
  • Speed vs E4B3× faster

Key Capabilities

  • Runs fully offline on Android and iOS — no internet required
  • Native audio input for speech recognition and understanding
  • Processes images and video natively
  • Up to 4× faster and 60% less battery vs previous Gemma on Android
  • Forward-compatible with Gemini Nano 4 coming to flagship Android devices later in 2026
  • Available now in Google AI Edge Gallery
Edge / Mobile
Gemma 4 E4B

Effective 4B provides more reasoning power than E2B while maintaining a mobile-first footprint. Built in close collaboration with Qualcomm Technologies and MediaTek, it targets mid-range to high-end smartphone NPUs.

  • Active parameters~4B (effective)
  • Context window128K tokens
  • Hardware targetsAndroid, iOS, tablets, Jetson Nano
  • NPU supportQualcomm & MediaTek optimised
  • Audio input✓ Native ASR
  • AccessAI Edge Gallery & AICore Developer Preview

Key Capabilities

  • Native function calling for on-device agentic workflows
  • Structured JSON output for reliable app integration
  • OCR and document understanding on mobile
  • Multilingual speech recognition across 140+ languages
  • Base model for the upcoming Gemini Nano 4 on Android
  • Prototype now via the AICore Developer Preview
Workstation / Server
Gemma 4 26B MoE

The 26B Mixture of Experts model contains 26 billion total parameters but activates only 3.8 billion per inference pass. This makes it exceptionally fast — delivering performance close to the 31B Dense model at significantly lower compute cost. It ranks #6 among open models on Arena AI (ELO 1441).

  • Total parameters26B (MoE)
  • Active per inference3.8B
  • Arena AI ranking#6 open (ELO 1441)
  • Context window256K tokens
  • Full precision targetSingle 80GB NVIDIA H100
  • PrioritySpeed & throughput

Key Capabilities

  • High tokens-per-second for real-time agent workflows
  • Native function calling and tool use
  • Offline code generation for local IDE integration
  • Chart understanding and visual document analysis
  • Quantised versions fit on consumer NVIDIA RTX GPUs
  • Available via Ollama, llama.cpp, and Google AI Studio
Workstation / Server
Gemma 4 31B Dense

The flagship variant uses all 31 billion parameters on every inference pass, maximising output quality. It ranks #3 among open models on Arena AI (ELO 1452), outperforming models 20× its size. Positioned by Google as the prime candidate for fine-tuning.

  • Total parameters31B (Dense)
  • Arena AI ranking#3 open (ELO 1452)
  • Context window256K tokens
  • GPQA Diamond score85.2%
  • AIME 2026 score89.2% (no tool use)
  • Full precision targetSingle 80GB NVIDIA H100

Key Capabilities

  • State-of-the-art math and instruction-following
  • Strong foundation for domain-specific fine-tuning
  • Local high-quality code generation — fully offline
  • Multi-step planning for complex agentic tasks
  • Available in Google AI Studio today
  • Runs quantised (Q4) on NVIDIA RTX 4090 (~20GB VRAM)
Performance Context

Arena AI Leaderboard: Open Models vs Parameter Count

Gemma 4’s 31B model competes with models carrying hundreds of billions of parameters. These are real ELO scores from the Arena AI text leaderboard as of April 1, 2026.

ELO scores per Google DeepMind’s official blog and Arena AI (April 1, 2026). Gemma 4 31B (31B total parameters) ranks #3, alongside Kimi-K2.5 (744B) and GLM-5 (1 trillion parameters). The 26B MoE activates only 3.8B parameters at inference time.

What It Can Do

Core Capabilities Across All Four Models

Every Gemma 4 variant ships with these capabilities out of the box — no fine-tuning required for standard use cases.

🧠
Multi-Step Reasoning
Complex problem-solving with multi-step planning. The 31B scores 89.2% on AIME 2026 and 85.2% on GPQA Diamond — both without tool use.
🤖
Agentic Tool Use
Native function calling and structured JSON output let Gemma 4 connect to external APIs and automate multi-step tasks autonomously.
💻
Offline Code Generation
High-quality code generated entirely on local hardware. Supported in Android Studio’s Agent Mode. No internet connection needed.
🖼️
Vision & Video
All four models natively process images and video at variable resolutions. Reliable at OCR, chart reading, handwriting, and document analysis.
🎙️
Audio Input (E2B / E4B)
E2B and E4B support native audio input for speech recognition and audio understanding — running entirely on-device.
🌍
140+ Languages
Pretrained on over 140 languages. All models handle multilingual text and conversations without additional configuration.
On Record

Official Statements

Statements from official press releases and verified public posts.

“The best open models in the world for their respective sizes.”

— Demis Hassabis, CEO, Google DeepMind (Official Launch Statement, April 2026)

“Incredible amount of intelligence per parameter.”

— Sundar Pichai, CEO, Google (Official Launch Statement, April 2026)

“A huge milestone.”

— Clément Delangue, Co-founder, Hugging Face (on the Apache 2.0 licence switch, April 2026)
Interactive Tool

Which Gemma 4 Should You Use?

Answer two quick questions to find the model that fits your situation. Based purely on Google’s official hardware and use-case guidance.

1. What kind of device will you run it on?

How We Got Here

The Gemma Story So Far

Gemma 4 is the fourth major generation of Google’s open model family. Here’s the path from first release to today.

🌱
FEBRUARY 2024
Gemma 1 Launches
Google’s first open-weight model family launches in 2B and 7B sizes under a custom proprietary licence. The Gemmaverse begins.
📈
JUNE 27, 2024
Gemma 2 Released
Gemma 2 expands the family with improved performance. The community reaches 100 million downloads and 60,000+ variants by this point.
🚀
MARCH 12, 2025
Gemma 3 Arrives
Gemma 3 adds multimodal support (text + images), a 128K context window, and 140+ language support across four sizes (1B–27B). Custom licence still in place.
🔓
APRIL 2, 2026
Gemma 4 — Apache 2.0
Four new models (E2B, E4B, 26B MoE, 31B Dense) launch under the Apache 2.0 licence. Total downloads now exceed 400 million, with 100,000+ community variants. The 31B ranks #3 globally among open models on Arena AI.
Licensing

What the Apache 2.0 Switch Actually Changes

The previous custom Gemma licence was the single biggest reason enterprise teams avoided building on Gemma. Apache 2.0 removes that friction entirely.

⛔ Old Gemma Licence (Gemma 1–3)
Prohibited-use policy — Google could update terms unilaterally at any time
Developers had to pass Google’s rules down to all derived projects
Could apply to models trained on synthetic data produced by Gemma
Commercial use required navigating case-by-case restrictions
✅ Apache 2.0 (Gemma 4)
Commercially permissive — build, sell, and ship products freely
Developers retain full control of their data and deployment environment
Widely understood by the developer community — no new terms to interpret
Google cannot change the licence conditions going forward
Supported Hardware

Gemma 4 Runs Across This Entire Spectrum

From a Raspberry Pi drawing 5W to an H100 data-centre GPU — Gemma 4 has a variant designed for each tier.

📱
Android Phones
E2B & E4B via AICore
Gemini Nano 4 ready
🍎
iOS Devices
E2B & E4B via LiteRT-LM
CPU & GPU support
🍓
Raspberry Pi 5
E2B via litert-lm CLI
Linux, macOS, Raspberry Pi
🔧
NVIDIA Jetson
Jetson Orin Nano
E2B & E4B
🖥️
Consumer GPUs
RTX 4090 etc.
26B & 31B quantised
DGX Spark
NVIDIA personal AI
supercomputer
🏭
H100 (80GB)
31B & 26B unquantised
single GPU, bfloat16
🌐
Browser
transformers.js
WebGPU execution

Gemma 4 was covered here as four open AI models released by Google on April 2, 2026, built on the same research base as its proprietary Gemini 3 systems. The release included a shift from a restrictive custom licence to Apache 2.0, four model sizes (E2B, E4B, 26B MoE, 31B Dense), and capabilities including native function calling, multimodal input processing, and support for over 140 languages.

The 31B Dense model holds the #3 spot on the Arena AI open model leaderboard (ELO 1452), and the 26B MoE holds #6 (ELO 1441) — both as of the April 1, 2026 Arena snapshot cited in Google’s official launch announcement. The 26B MoE activates only 3.8 billion of its 26 billion parameters during inference, keeping hardware requirements manageable for developer workstations.

Google confirmed that Gemma 4 E2B and E4B serve as the base for Gemini Nano 4, expected to arrive on flagship Android devices later in 2026. The models were discussed in the context of NVIDIA’s RTX hardware ecosystem, next-generation PC platforms, and the broader trend of local AI processing covered across Apple and platform developments this year. Model weights are downloadable from Hugging Face, Kaggle, and Ollama.

Leave a comment