Advertisment

Meta’s Vanilla Maverick Ai Model Ranks Below Rivals on a Popular Chat Benchmark

Earlier this week, meta Landed in hot water For Using an Experimental, Unreleased Version of its lLAma 4 maverick model to achieve a high screen on a crowdsourced benchmark, lm arena. The incident Prompted the maints of lm area to apologizeChange their policies, and score the unmodified, Vanilla Maveryick.

Turns out, it’s not very competitive.

The Unmodified Maverick, “LLAMA-4-Mavioric-17B-128e-Instruct,” was ranked beLow models Including Openai’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

Why the Poor Performance? Meta’s Experimental Maverick, LLAMA-4-MAVERICK-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published Last saturday. Thos optimizations evidently played well to lm area, which has human raters compare the outputs of models and choice which they prefer.

As we’ve written about beforeFor Various Reasons, Lm Arena has Never Been the Most Reelible Measure of An Ai Model’s performance. Still, tailoring a model to a Benchmark – Besides Being Misleading – Makes it challenging for developers to predict exactly how well the model will perform in difference containxts.

In a statement, a me meta spekesperson told techcrunch that meta experiences with “all types of custom variants.”

‘LLAMA-4-MAVERICK-03-26-Experimental’ is a chat optimized version we experience with that also performs well on lmarena, “The speakesperson said. “We have now released our open source version and will see how developers customized llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing fedbacks.”

Leave a Reply

Your email address will not be published. Required fields are marked *