Grok-4 Outscores Rivals by 17% on Key AI Tests: xAI Launches $300 Premium Tier Amid Controversy

Rahul Somvanshi

Elon Musk’s artificial intelligence venture, xAI, officially launched Grok-4 on Wednesday, July 9th, following a livestream announcement that began more than an hour behind schedule. The new AI model arrives during a turbulent week for Musk’s companies, just hours after X CEO Linda Yaccarino resigned and amid fallout from antisemitic content generated by earlier Grok versions.

“With respect to academic questions, Grok-4 is better than PhD level in every subject, no exceptions,” Musk claimed during the livestream. “At times, it may lack common sense, and it has not yet invented new technologies or discovered new physics, but that is just a matter of time.”

The release introduces two models: standard Grok-4 and Grok-4 Heavy, which xAI describes as a “multi-agent version” offering enhanced performance. Grok-4 Heavy reportedly uses multiple AI agents working simultaneously on problems “like a study group” to find optimal answers.

Both models are available through a new premium subscription tier called “SuperGrok Heavy,” priced at $300 per month – making it the most expensive AI subscription among major providers. This premium service targets enterprises and power users requiring advanced capabilities for complex tasks.

The announcement came just hours after xAI’s chief scientist, Igor Babuschkin, resigned. Meanwhile, xAI faced criticism after Grok’s automated account on X posted antisemitic content, including praising Hitler. The company temporarily disabled Grok’s posting abilities and reportedly modified its system prompts. During the launch event, Musk and xAI largely avoided discussing these controversies.

Grok-4 was trained on xAI’s “Colossus” supercomputer using 200,000 NVIDIA H100 GPUs, enabling a claimed 100x increase in training compute and data compared to Grok-2.

The company presented benchmark results showing Grok-4’s performance exceeding competitors on several academic tests. According to xAI, Grok-4 scored 25.4% on Humanity’s Last Exam without tools, outperforming Google’s Gemini 2.5 Pro (21.6%) and OpenAI’s o3 (21%). With tools enabled, Grok-4 Heavy reportedly achieved 44.4% on the same test.

The nonprofit Arc Prize reported that Grok-4 achieved a 16.2% score on its ARC-AGI-2 test, nearly doubling the previous commercial state-of-the-art benchmark. Other leaked benchmarks suggested strong performance on graduate-level physics questions (GPQA) and mathematics exams (AIME).

Key features include improved reasoning and coding capabilities, with multimodal support for text, images, and voice. The model includes “DeepSearch” for real-time internet access, particularly integrating with X’s platform for live updates. Users can also customize voice tones and responses.


Similar Posts


For developers, xAI has released API access to Grok-4 with support for up to 256K tokens in context window. The company outlined future releases including a specialized coding model in August, a multi-modal agent in September, and video generation capabilities in October.

While Grok-4 remains proprietary, xAI announced plans to release smaller, open-source variants later in 2025 to support broader research and development. The launch intensifies competition in the AI sector as OpenAI prepares to release GPT-5 later this summer.

Musk’s bold predictions for Grok-4 include the possibility of discovering new technologies by the end of 2025 and potentially “new physics” within two years, though he acknowledged the current model’s limitations in common sense reasoning and technological innovation.

Leave a comment