Sarvam AI scores 84.3% on OCR benchmark, beats ChatGPT and Google Gemini Pro in Indian language test

GigaNectar Team

Sarvam AI logo signage displayed on textured wall at Bengaluru office headquarters showing company branding in black lettering

🇮🇳 India’s AI Revolution: Sarvam AI

Homegrown innovation beating global giants in OCR and voice technology

India finally has an AI model that competes at world-class levels for India-specific tasks. Sarvam AI, a Bengaluru-based startup founded in August 2023, created Sarvam Vision that beats Google Gemini and ChatGPT in reading documents in Indian languages, plus Bulbul V3 that excels at AI voice generation. The company is creating foundational AI models from scratch in India, with deep talent from IIT Bombay, ETH Zurich, Microsoft Research, and IBM Research. This development aligns with India’s push for technological independence, similar to recent AI automation advances in the broader tech landscape.

Sarvam Vision OCR Performance

Sarvam AI’s Vision model outperformed major global competitors in optical character recognition benchmarks, demonstrating India’s capability to compete at the highest level of AI development. The model achieved top scores on industry-standard tests for document intelligence.

olmOCR-Bench

84.3%
Document-level OCR accuracy

OmniDocBench v1.5

93.28%
Complex document parsing

Indic Languages

22
Scheduled Indian languages
AI Model olmOCR-Bench Status
Sarvam Vision 84.3% Winner
Google Gemini Pro Lower
DeepSeek OCR v2 Lower
OpenAI ChatGPT Significantly Lower
Anthropic Claude Lower

What Makes It Special?

Sarvam Vision excels at complex layouts, technical tables, and mathematical formulas where traditional OCR systems often struggle because of messy formatting and dense content.

  • Sovereign AI: Developed entirely in Bengaluru, India, representing true technological independence
  • Efficient Architecture: 3-billion parameter state-space model optimized for inference efficiency
  • Real-World Performance: Handles production-grade workloads with stable performance across diverse document types
  • Native Multilingual: Built from the ground up for 22 scheduled Indian languages with culturally-aware processing

Tech commentator Deedy Das, who earlier questioned the value of building smaller Indic-language models, recently stated: “I was wrong about Sarvam. When I wrote about them a year ago, I felt like the direction to train small Indic language models was wrong. But boy, have they turned it around. They have the best text-to-speech, speech-to text, and OCR models for Indic languages, and that’s actually really valuable. The pricing is very reasonable.”

Comprehensive Indian Language Coverage

Sarvam AI’s models support 22 scheduled Indian languages for OCR and 11 languages for text-to-speech, addressing a critical gap that global AI labs have largely ignored. This positions the company as a pioneer in making AI truly accessible for India’s linguistic diversity, a development as important as recent cybersecurity infrastructure improvements for digital India.

Sarvam Vision OCR – 22 Languages

Hindi
Bengali
Tamil
Telugu
Marathi
Malayalam
Kannada
Gujarati
Odia
Punjabi
Urdu
Assamese
Sanskrit
Konkani
Manipuri
Nepali
Sindhi
Dogri
Kashmiri
Maithili
Santhali
Bodo

Bulbul V3: AI Voice for India

The latest text-to-speech model delivers natural, expressive audio with 30+ professional voices across 11 Indian languages, with plans to expand to all 22 scheduled languages. The model focuses on production-ready quality for real-world applications.

  • Natural Prosody: Generates pauses, emphasis, pacing, and tone modulation for lifelike speech across Indian languages
  • Low-Latency Streaming: Real-time audio generation and playback capabilities for interactive applications
  • Professional Quality: Voices sourced from professional voice artists with culturally authentic pronunciation
  • Cost-Effective: Competitive pricing compared to global alternatives, making it accessible for Indian developers

Pratik Desai, founder of KissanAI, noted: “We use Bulbul as our go-to tts model for our Indic use cases, and they have just gotten better with each release. Meanwhile, ElevenLabs cost never made sense for Indic or any other languages.”

Sarvam Vision Capabilities

A multimodal vision-language model designed for comprehensive document intelligence and visual understanding, built on advanced training algorithms and rigorous data curation processes. The technology architecture parallels advancements in display technologies in terms of performance optimization.

  • Document Intelligence: High-accuracy OCR for scanned documents, historical collections, and physical archives in 22 Indian languages
  • Complex Table Parsing: Structure and relationship recognition of table cells with high fidelity for financial and technical documents
  • Chart Interpretation: Structured extraction, description, and analysis of visual data from graphs and diagrams
  • Mathematical Formulas: Accurate recognition and parsing of complex mathematical expressions in technical literature
  • Layout Understanding: Semantic layout parsing with reading order detection for complex multi-column documents
  • Visual Reasoning: Native multilingual capabilities for interpreting charts and illustrations with cultural context
  • Handwriting Recognition: Processes student handwriting and historical manuscripts in Indian scripts
  • Scene Text Recognition: OCR in the wild for signboards, notices, and real-world text in diverse environments

Technology Architecture

The model architecture prioritizes efficiency and accuracy through carefully designed training pipelines and data quality standards, much like how data security systems prioritize robust architecture.

  • Model Size: 3-billion parameter state-space architecture optimized for fast inference and reduced computational costs
  • Training Data: High-quality synthetic and real-world samples across scientific literature, financial documents, government bulletins, and historical manuscripts
  • Training Process: Continual pretraining, supervised fine-tuning, and reinforcement learning with verifiable rewards for accuracy
  • Free Trial: Complete API access free for February 2026 for developers and enterprises to test production workloads

Try Sarvam Vision Today

Completely free for the entire month of February 2026

Access API Platform →

The Visionaries Behind Sarvam AI

Founded in August 2023, Sarvam AI represents a rare fusion of deep technical expertise, policy insight, and national vision. The founding team brings together academic excellence, industry experience, and government advisory roles to create India’s sovereign AI ecosystem. Their work contributes to India’s technological infrastructure alongside developments like computing hardware advances.

Pratyush Kumar

CEO & Co-Founder
PhD from ETH Zurich and Bachelor’s from IIT Bombay. Previously worked with Microsoft Research, IBM Research, and served as Adjunct Faculty at IIT Madras. Founded AI4Bharat initiative focusing on Indian-language AI tools and PadhAI platform for affordable online learning. His work bridges academic research and real-world applications in language technologies for over a decade.

Vivek Raghavan

Co-Founder
Over two decades of experience in Electronic Design Automation. Founded and sold two EDA firms, held senior positions at Magma Design Automation, Synopsys, and Avant! Corporation. Served on AI Committee of Supreme Court of India, overseeing SUVAS rollout. Contributed to fraud detection models for GSTN, advised NPCI, and helped frame Data Empowerment and Protection Architecture.

Industry Recognition

Sarvam AI’s models have received validation from global tech experts and Indian developers alike. The company’s focus on solving India-specific problems while maintaining world-class performance standards has attracted attention from investors, government bodies, and the developer community.

  • Global Validation: Tech commentator Deedy Das publicly acknowledged underestimating Sarvam, now recognizing their strong position in Indic AI
  • Sovereign AI Movement: Positioning India alongside the US and China as countries with domestically developed foundation AI systems
  • National Initiative: Aligned with India Semiconductor Mission 2.0 announced in Budget 2026-2027
  • Developer Community: Active Discord community for collaboration and feedback on model improvements and use cases

The article covered Sarvam AI’s achievements in OCR and voice technology for Indian languages. The Bengaluru-based startup, founded in August 2023 by Pratyush Kumar and Vivek Raghavan, released Sarvam Vision and Bulbul V3 models. Sarvam Vision scored 84.3% on olmOCR-Bench and 93.28% on OmniDocBench v1.5, outperforming Google Gemini Pro, ChatGPT, and other global models. The 3-billion parameter model supports 22 scheduled Indian languages for OCR. Bulbul V3 offers 30+ professional voices across 11 Indian languages for text-to-speech. The models are available through Sarvam’s API platform with free access during February 2026.

Leave a comment