Sarvam AI scores 84.3% on OCR benchmark, beats ChatGPT and Google Gemini Pro in Indian language test

🇮🇳 India’s AI Revolution: Sarvam AI

Homegrown innovation beating global giants in OCR and voice technology

India finally has an AI model that competes at world-class levels for India-specific tasks. Sarvam AI, a Bengaluru-based startup founded in August 2023, created Sarvam Vision that beats Google Gemini and ChatGPT in reading documents in Indian languages, plus Bulbul V3 that excels at AI voice generation. The company is creating foundational AI models from scratch in India, with deep talent from IIT Bombay, ETH Zurich, Microsoft Research, and IBM Research. This development aligns with India’s push for technological independence, similar to recent AI automation advances in the broader tech landscape.

Sarvam Vision OCR Performance

Sarvam AI’s Vision model outperformed major global competitors in optical character recognition benchmarks, demonstrating India’s capability to compete at the highest level of AI development. The model achieved top scores on industry-standard tests for document intelligence.

olmOCR-Bench

84.3%

Document-level OCR accuracy

OmniDocBench v1.5

93.28%

Complex document parsing

Indic Languages

Scheduled Indian languages

AI Model	olmOCR-Bench	Status
Sarvam Vision	84.3%	Winner
Google Gemini Pro	Lower	–
DeepSeek OCR v2	Lower	–
OpenAI ChatGPT	Significantly Lower	–
Anthropic Claude	Lower	–

What Makes It Special?

Sarvam Vision excels at complex layouts, technical tables, and mathematical formulas where traditional OCR systems often struggle because of messy formatting and dense content.

Sovereign AI: Developed entirely in Bengaluru, India, representing true technological independence
Efficient Architecture: 3-billion parameter state-space model optimized for inference efficiency
Real-World Performance: Handles production-grade workloads with stable performance across diverse document types
Native Multilingual: Built from the ground up for 22 scheduled Indian languages with culturally-aware processing

Tech commentator Deedy Das, who earlier questioned the value of building smaller Indic-language models, recently stated: “I was wrong about Sarvam. When I wrote about them a year ago, I felt like the direction to train small Indic language models was wrong. But boy, have they turned it around. They have the best text-to-speech, speech-to text, and OCR models for Indic languages, and that’s actually really valuable. The pricing is very reasonable.”

Comprehensive Indian Language Coverage

Sarvam AI’s models support 22 scheduled Indian languages for OCR and 11 languages for text-to-speech, addressing a critical gap that global AI labs have largely ignored. This positions the company as a pioneer in making AI truly accessible for India’s linguistic diversity, a development as important as recent cybersecurity infrastructure improvements for digital India.

Sarvam Vision OCR – 22 Languages

Hindi

Bengali

Tamil

Telugu

Marathi

Malayalam

Kannada

Gujarati

Odia

Punjabi

Urdu

Assamese

Sanskrit

Konkani

Manipuri

Nepali

Sindhi

Dogri

Kashmiri

Maithili

Santhali

Bodo

Bulbul V3: AI Voice for India

The latest text-to-speech model delivers natural, expressive audio with 30+ professional voices across 11 Indian languages, with plans to expand to all 22 scheduled languages. The model focuses on production-ready quality for real-world applications.

Natural Prosody: Generates pauses, emphasis, pacing, and tone modulation for lifelike speech across Indian languages
Low-Latency Streaming: Real-time audio generation and playback capabilities for interactive applications
Professional Quality: Voices sourced from professional voice artists with culturally authentic pronunciation
Cost-Effective: Competitive pricing compared to global alternatives, making it accessible for Indian developers

Pratik Desai, founder of KissanAI, noted: “We use Bulbul as our go-to tts model for our Indic use cases, and they have just gotten better with each release. Meanwhile, ElevenLabs cost never made sense for Indic or any other languages.”

Sarvam Vision Capabilities

A multimodal vision-language model designed for comprehensive document intelligence and visual understanding, built on advanced training algorithms and rigorous data curation processes. The technology architecture parallels advancements in display technologies in terms of performance optimization.

Document Intelligence: High-accuracy OCR for scanned documents, historical collections, and physical archives in 22 Indian languages
Complex Table Parsing: Structure and relationship recognition of table cells with high fidelity for financial and technical documents
Chart Interpretation: Structured extraction, description, and analysis of visual data from graphs and diagrams
Mathematical Formulas: Accurate recognition and parsing of complex mathematical expressions in technical literature
Layout Understanding: Semantic layout parsing with reading order detection for complex multi-column documents
Visual Reasoning: Native multilingual capabilities for interpreting charts and illustrations with cultural context
Handwriting Recognition: Processes student handwriting and historical manuscripts in Indian scripts
Scene Text Recognition: OCR in the wild for signboards, notices, and real-world text in diverse environments

Technology Architecture

The model architecture prioritizes efficiency and accuracy through carefully designed training pipelines and data quality standards, much like how data security systems prioritize robust architecture.

Model Size: 3-billion parameter state-space architecture optimized for fast inference and reduced computational costs
Training Data: High-quality synthetic and real-world samples across scientific literature, financial documents, government bulletins, and historical manuscripts
Training Process: Continual pretraining, supervised fine-tuning, and reinforcement learning with verifiable rewards for accuracy
Free Trial: Complete API access free for February 2026 for developers and enterprises to test production workloads

Try Sarvam Vision Today

Completely free for the entire month of February 2026

Access API Platform →

The Visionaries Behind Sarvam AI

Founded in August 2023, Sarvam AI represents a rare fusion of deep technical expertise, policy insight, and national vision. The founding team brings together academic excellence, industry experience, and government advisory roles to create India’s sovereign AI ecosystem. Their work contributes to India’s technological infrastructure alongside developments like computing hardware advances.

Pratyush Kumar

CEO & Co-Founder

PhD from ETH Zurich and Bachelor’s from IIT Bombay. Previously worked with Microsoft Research, IBM Research, and served as Adjunct Faculty at IIT Madras. Founded AI4Bharat initiative focusing on Indian-language AI tools and PadhAI platform for affordable online learning. His work bridges academic research and real-world applications in language technologies for over a decade.

Vivek Raghavan

Co-Founder

Over two decades of experience in Electronic Design Automation. Founded and sold two EDA firms, held senior positions at Magma Design Automation, Synopsys, and Avant! Corporation. Served on AI Committee of Supreme Court of India, overseeing SUVAS rollout. Contributed to fraud detection models for GSTN, advised NPCI, and helped frame Data Empowerment and Protection Architecture.

Industry Recognition

Sarvam AI’s models have received validation from global tech experts and Indian developers alike. The company’s focus on solving India-specific problems while maintaining world-class performance standards has attracted attention from investors, government bodies, and the developer community.

Global Validation: Tech commentator Deedy Das publicly acknowledged underestimating Sarvam, now recognizing their strong position in Indic AI
Sovereign AI Movement: Positioning India alongside the US and China as countries with domestically developed foundation AI systems
National Initiative: Aligned with India Semiconductor Mission 2.0 announced in Budget 2026-2027
Developer Community: Active Discord community for collaboration and feedback on model improvements and use cases

The article covered Sarvam AI’s achievements in OCR and voice technology for Indian languages. The Bengaluru-based startup, founded in August 2023 by Pratyush Kumar and Vivek Raghavan, released Sarvam Vision and Bulbul V3 models. Sarvam Vision scored 84.3% on olmOCR-Bench and 93.28% on OmniDocBench v1.5, outperforming Google Gemini Pro, ChatGPT, and other global models. The 3-billion parameter model supports 22 scheduled Indian languages for OCR. Bulbul V3 offers 30+ professional voices across 11 Indian languages for text-to-speech. The models are available through Sarvam’s API platform with free access during February 2026.

🇮🇳 India’s AI Revolution: Sarvam AI

Sarvam Vision OCR Performance

olmOCR-Bench

OmniDocBench v1.5

Indic Languages

What Makes It Special?

Comprehensive Indian Language Coverage

Sarvam Vision OCR – 22 Languages

Bulbul V3: AI Voice for India

Sarvam Vision Capabilities

Technology Architecture

Try Sarvam Vision Today

The Visionaries Behind Sarvam AI

Pratyush Kumar

Vivek Raghavan

Industry Recognition

Leave a comment Cancel reply

News, Technology

Wozniak Says “I Am Not a Fan of AI” and Lists 4 Gaps It Cannot Close as Apple Turns 50

News, Technology

Windows 11 Patch KB5079473 Broke 8 Apps Sign-In — Microsoft Emergency Fix KB5085516 Won’t Auto-Download

News, Technology

Barclays Says iPhone Fold Ships in December, Not September — Full Timeline and $2,000+ Price Explained

AI, News, Technology

OpenAI Merges ChatGPT, Codex, and Atlas Into One Desktop App After Calling Fragmentation a “Code Red”

News, Technology

Meta’s $83.6B Metaverse Ends as Horizon Worlds VR Moves to Maintenance Mode After 300K Peak Users

News, Technology

DarkSword iOS Exploit Chains 6 Bugs, Hits 221M iPhones in 4 Countries — and Leaves No Trace Behind

Sarvam AI scores 84.3% on OCR benchmark, beats ChatGPT and Google Gemini Pro in Indian language test

🇮🇳 India’s AI Revolution: Sarvam AI

Sarvam Vision OCR Performance

olmOCR-Bench

OmniDocBench v1.5

Indic Languages

What Makes It Special?

Comprehensive Indian Language Coverage

Sarvam Vision OCR – 22 Languages

Bulbul V3: AI Voice for India

Sarvam Vision Capabilities

Technology Architecture

Try Sarvam Vision Today

The Visionaries Behind Sarvam AI

Pratyush Kumar

Vivek Raghavan

Industry Recognition

Share this:

Leave a comment Cancel reply

most recent

News, Technology

Wozniak Says “I Am Not a Fan of AI” and Lists 4 Gaps It Cannot Close as Apple Turns 50

News, Technology

Windows 11 Patch KB5079473 Broke 8 Apps Sign-In — Microsoft Emergency Fix KB5085516 Won’t Auto-Download

News, Technology

Barclays Says iPhone Fold Ships in December, Not September — Full Timeline and $2,000+ Price Explained

AI, News, Technology

OpenAI Merges ChatGPT, Codex, and Atlas Into One Desktop App After Calling Fragmentation a “Code Red”

News, Technology

Meta’s $83.6B Metaverse Ends as Horizon Worlds VR Moves to Maintenance Mode After 300K Peak Users

News, Technology

DarkSword iOS Exploit Chains 6 Bugs, Hits 221M iPhones in 4 Countries — and Leaves No Trace Behind