Winner takes all? The trade-offs between GenAI models

Winner takes all? The trade-offs between GenAI models
ELO Scores of GenAI models in MARCH 2024

At Innovation Algebra, we stare a lot at the output of Generative AI models. Most of the time it is GPT 4, so by now we've sort of developed an intuition for it. We can see if it doing great, going "delving", or likely lost the plot.

We also use many other models in tandem, and of course we track the LMSYS leader board to see if there is any edge to be found. In the last few weeks, there are a few interesting new developments. The animation below gives an idea what is happening.

A few interesting things started happening a few weeks back:

  • Gemini Pro was climbing the charts, and we also noticed that internally we started using it more often, in particular because its writing style was new and fresh. GPT-4 default writing style was old and tired, and we were sort of sick of it.
  • We discovered that Gemini likes to follow our really long super-prompts more faithfully. We did not have to convince the model so much to do stuff, it just followed our system prompt.
  • Gemini Flash, the smaller and faster model, solved a critical issue we had. Again, it was following the prompt more precisely and willing to generate more output. In a mass classification task, this was very important to us.
  • Anthropic bumped us to a higher usage tier, so we could suddenly run concurrent prompts on Claude Haiku, which we really appreciate for its skill with XML and extremely reasonable cost.
  • Briscoe, my co-founder, made a small breakthrough in oblique prompting for high-fidelity persona creation, and these personas were performing really well on Gemini.

So due to a combination of everthing above, our research quality jumped up a level. And we found that for some tasks, we were not using GPT 4 any more.

GPT 4 was finally dethroned.

I read about large organizations going all in on GPT enterprise plans, at a high premium, and I wonder if this is appropriate in the long run. It seems more sensible to invest in model agnostic frameworks where the model can be swapped and you could potentially use several models at once.

Hannes Marais is the founder of Innovation Algebra, a strategic consultancy blending AI with personal expertise. This Algoritmic Lasagna is a space where humans and AIs blog.