The words flow like endless rain: summing up a busy week in LLM news

Zoom in / Portrait of a boy amazed by flying letters.

Some weeks in AI news are eerily quiet, but during others, controlling the events of the week feels like trying to stem the tide. This week saw three notable Large Language Model (LLM) releases: Google Gemini Pro 1.5 General availability With the free tier, OpenAI charges a new copy GPT-4 Turbo, and Mistral has released a new openly licensed LLM, Mixtral 8x22B. The three launches took place within 24 hours starting Tuesday.

With the help of software engineer and independent AI researcher Simon Willison (who also… Books about (The Frantic LLM launches this week on its own blog), we'll briefly cover each of the three main events in roughly chronological order, and then dive into some additional AI events this week.

Gemini Pro 1.5 General Release

On Tuesday morning Pacific time, Google Announce The Gemini 1.5 Pro model (which we first covered in February) is now available in more than 180 countries, excluding Europe, via the Gemini API in public preview. This is the most powerful general LLM certificate Google has offered to date, and it's available in a free tier that allows up to 50 applications per day.

It supports up to 1 million input context tokens. As Willison notes On his blogthe API price for Gemini 1.5 Pro is $7/M of input codes and $21/M of output codes costs slightly less than GPT-4 Turbo (reasonable prices At $10/million in and $30/million out) and more Claude 3 Sonnet (Mid-Level MBA at Anthropic, reasonable prices At $3/million inbound and $15/million outbound).

See also  New big security bug in Google Chrome: You need to update your browser ASAP

It is worth noting that Gemini 1.5 Pro includes native voice (speech) input processing that allows users to upload audio or video prompts, a new file API for handling files, the ability to add custom system instructions (system prompts) to direct form responses, and a JSON mode. Structured data extraction.

Launch of “significantly improved” GPT-4 Turbo.

GPT-4 Turbo performance chart provided by OpenAI.
Zoom in / GPT-4 Turbo performance chart provided by OpenAI.

Shortly after Google launched 1.5 Pro on Tuesday, OpenAI announced it was rolling out a “significantly improved” version of GPT-4 Turbo (a model family originally launched in November) called “gpt-4-turbo-2024-04”-09. “It integrates multimodal GPT-4 vision processing (recognition of image contents) directly into the model, and is initially launched through API access only.

Then on Thursday, OpenAI announced that the new GPT-4 Turbo model was becoming available to paid ChatGPT users. OpenAI said the new model improves “abilities in writing, mathematics, logical thinking and coding.” Chart shared This is not particularly useful in judging abilities (which they later did Updated). The company too I provided an example of the claimed improvement, saying that when writing with ChatGPT, the AI ​​assistant will be “more direct, less verbose, and use more conversational language.”

The vague nature of OpenAI's GPT-4 Turbo announcements attracted some confusion And online criticism. On the 10th, Willison books“Who will be the first LLM provider to publish really useful release notes?” In some ways, this is a case of “AI sentiment” again, as we discussed in our lament on the poor state of LLM standards during Cloud 3's debut. “I didn't actually detect any specific differences in quality [related to GPT-4 Turbo]”, Willison told us directly in an interview.

See also  With the push of a button, this tiny house turns into a box that you can drag anywhere

The update also expanded GPT-4's knowledge limit until April 2024, although some people reported that it achieves this by… Hidden web searches In the background, others are on social media Reported issues With confusions related to history.

Mistral Mystery Edition Mixtral 8x22B

Illustration of a robot holding a French flag, metaphorically reflecting the rise of artificial intelligence in France due to Mistral.  It's difficult to draw a portrait of a Master of Laws, so a robot will have to do it.
Zoom in / Illustration of a robot holding a French flag, metaphorically reflecting the rise of artificial intelligence in France due to Mistral. It's difficult to draw a portrait of a Master of Laws, so a robot will have to do it.

Not to be outdone, the French artificial intelligence company Mistral, on Tuesday evening, launched its latest openly licensed model, the Mixtral 8x22B, through… Tweet with a torrent link It is devoid of any documentation or comments, as it did with previous versions.

The new Hybrid Edition of Experts (MoE) features a greater number of parameters than the previously more capable open model, Mixtral 8x7B, which we covered in December. It's rumored to potentially be as capable as GPT-4 (in what way? Vibes). But this has not yet been seen.

“Evaluations are still ongoing, but the biggest open question now is how well the Mixtral 22x8B is shaped,” Willison told Ars. “If it's in the same quality class as GPT-4 and Claude 3 Opus, we'll finally have an openly licensed model that doesn't fall significantly behind the best proprietary models.”

This release got Willison very excited, saying: “If this thing is really GPT-4, it's wild, because you can run it on an (expensive) laptop. I think you need a 128GB RAM MacBook for that, which is “Twice what I needed.” Owns.”

See also  Random: It's done! Every level of Super Mario Maker was cleared before the Wii U went online

Willison noted that the new Mixtral software has not been included in Chatbot Arena yet, because Mistral has not released an exact chat model yet. This is still a prototype program for predicting the next LLM token. “There is now at least one controlled version of the community instruction,” Willison says.

Changes to the leaderboard in Chatbot Arena

Screenshot of Chatbot Arena Leaderboard taken on April 12, 2024.
Zoom in / Screenshot of Chatbot Arena Leaderboard taken on April 12, 2024.

Bing Edwards

This week's LLM news isn't just about the big names in the field. There have also been rumblings on social media about the increased performance of open source models such as the Cohere model R+ commandany Reached position 6 In the LMSYS Chatbot Arena Leaderboard – the highest ranking ever for the Open Weights model.

And for more excitement in the Chatbot Arena, it seems that the new version of GPT-4 Turbo has proven itself competitive with Claude 3 Opus. The two are still in a statistical tie, but GPT-4 Turbo has recently I pulled forward numerically. (In March, we reported when Claude 3 led digitally for the first time on GPT-4 Turbo, which was the first time another AI model had beaten a GPT-4 family model on the leaderboard.)

Regarding this fierce competition between LLMs – which most of the smuggling world is unaware of and likely never will be – Wilson told Ars: “The last couple of months have been a whirlwind – we finally have not just one, but several models capable of competing with GPT-4. ” We'll see if OpenAI's rumored launch of GPT-5 later this year will restore the company's technological leadership, we note, that once seemed insurmountable. But for now, Willison says, “OpenAI is no longer the undisputed leader in MBAs.”

Leave a Reply

Your email address will not be published. Required fields are marked *