OpenAI Launches GPT-4o Mini: Outperforms GPT-3.5 with 82% Accuracy in MMLU Tests, Redefines AI Accessibility

Rahul Somvanshi

GPT-4o mini.

OpenAI has begun rolling out a new product called GPT-4o mini to ChatGPT and API (Application Programming Interface) users. This small language model arrives with enhanced textual intelligence, multimodal reasoning, and reduced usage fees for developers. It is also being introduced as the replacement for GPT-3.5 Turbo. The new model scored 82% in Measuring Massive Multitask Language Understanding (MMLU). GPT-3.5 scored 70%, GPT-4o scored 88.7%, Claude 3 Haiku scored 75.2%, and Gemini 1.5 Flash scored 78.9%. Google Gemini Ultra tops the ranking with a score of 90% in this benchmark test.

The concept of small-sized models is not new. For instance, Google has Gemini 1.5 Flash and Anthropic has Claude 3 Haiku. These variants, like the one announced today by OpenAI, are not only lighter, faster, and more efficient but also cheaper to use than the more powerful and expensive flagship models. In this latter group, we find GPT-4o, Gemini 1.5 Pro, and Claude 3 Opus.

The language models that the company led by Sam Altman has launched over the past few years are not only present in ChatGPT. They are also a key element in many applications that benefit from their capabilities. However, professional use at this level is not free. Developers pay for token groups, making pricing a very important aspect. Given that there is no single dominant player here, and competition in the AI field has become fierce, companies are trying to offer better proposals for developers. We saw this a few months ago when GPT-4o arrived with much cheaper usage fees than GPT-4 Turbo with an identical context length, but with all the advantages boasted by the new model.

Now, OpenAI has done it again. GPT-4o mini is priced at 15 cents per million input tokens and 60 cents per million output tokens. For comparison, GPT-4o is priced at $5 per million input tokens and $15 per million output tokens. In all cases, we are talking about a 128K context window. GPT-4o mini has an input context window of 128K tokens, which is the amount of text that can be analyzed at once, thus limiting the analysis of large volumes of business and legal documents. The output window is limited to 16K tokens. The model also has a knowledge cutoff of October 2023, so news, events, and discoveries that occur after this date are unknown to the AI and cannot be used to answer questions.

https://x.com/OpenAI/status/181399170608334079

On a slightly more technical level, GPT-4o mini currently supports text and vision in the API. In the future, they promise it will support text, image, video, and audio inputs and outputs. This is a notable advantage compared to GPT-3.5 Turbo, which was not only more expensive but also limited to text inputs and outputs only. GPT-4o mini is ideal for applications that connect to multiple APIs, need to absorb a large amount of code, or interact with clients through quick responses. A real use scenario we can mention is AI-powered customer service chatbots.

In a direct comparison with GPT-4o, OpenAI’s best LLM published in 2024, GPT-4o mini consistently provides less accurate responses. Compared to GPT-3.5 Turbo from 2022, GPT-4o mini consistently performs better. In a variety of college-level AI benchmark tests (DROP, HumanEval, MATH, MathVista, MGSM, MMLU, and MMMU), the model answers questions accurately approximately 60 to 80% of the time. Only in the PhD-level test (GPQA) does its accuracy drop to about 40%, which is only slightly better than that of a non-expert person searching for an answer on the Internet.

https://x.com/OpenAIDevs/status/1813990750851612830

A large language model (LLM) is created after training on millions of documents and forms the basis of an AI chatbot like ChatGPT. The model contains mathematical vectors that associate the probability of words, images, and more appearing together. For example, the probability of “ice” appearing next to “cream” is much higher than that of it appearing next to “stone.” However, a large LLM uses a lot of computing power and energy to respond to user requests, resulting in high costs for users. Reducing the size of LLMs can make them smaller, cheaper, and more eco-friendly, with the trade-off of less precise responses.

ChatGPT users with Free, Plus, and Team accounts will also benefit from all the advantages of GPT-4o mini. The new model will be available to all of them starting today. However, those with enterprise accounts will have to wait a little longer. GPT-4o mini will begin rolling out for them next week.

Leave a comment