Recap: Why Companies Are Betting on Small AI Models

Ever since ChatGPT exploded, tech companies have been racing to replicate its success with their own large language models, from Google’s Gemini to Meta Platforms’ Llama. But bigger isn’t always better. Small language models promise enterprises less lag, lower costs and a smaller environmental footprint.

Amir Efrati, executive editor of The Information, sat down with three artificial intelligence developers to discuss the advantages of leveraging SLMs, how they’re primed to tackle AI’s shortcomings, and what the future might hold for AI models going forward:

Kathy Baxter, principal architect of ethical AI practice, Salesforce
Jonathan Frankle, chief neural network scientist, Databricks
Bindu Reddy, CEO and co-founder, Abacus AI

The Advantage of Going Small

Language models are measured by parameters—the numerical values learned during training. Unlike LLMs, which can have upward of 30 billion parameters, SLMs are trained on much smaller datasets and have simpler functionality, meaning they can operate with fewer parameters. This, said Kathy Baxter, means they use far less computing power.

“Small models don’t scrape the web. The result is a model that requires a whole lot less data and, more importantly for me, it’s better for the environment,” Baxter said.

The computing power required to run LLMs is also becoming prohibitively expensive, noted Bindu Reddy. She pointed to the debut this week of Meta’s Llama 3.0 405B model.

“Everywhere has been trying to get hold of GPUs because of this release,” she said. “But even if you have the money, you can’t get it.”

A Model for Every Occasion

One of the big selling points of small models is their versatility. The panelists agreed that the ability to customize SLMs to perform specific tasks is what makes them such a powerful tool. At Salesforce, Baxter said she prefers employing different small models for different tasks.

Jonathan Frankle added that smaller models use more-manageable datasets, providing more transparency and alleviating some of the black box issues of larger LLMs.

Referencing how Databricks customers use the platform to train SLMs, he said:

“They can train it on exactly the data they want to train on, and they know exactly what data went into that model. [With SLMs] you can build exactly what you need, nothing more.”

When asked about their favorite SLM models, Baxter called out Google Gemma and Reddy mentioned Mega’s newly released Llama 3.1 has some impressive metrics. Frankle preferred not to play favorites, saying that with small models you don’t need to.

Tackling AI’s Shortcomings

When asked if smaller models can help mitigate some of AI’s more well-known challenges, like bias and hallucinations, Frankle said while they could theoretically reduce bias by offering better control of the data used for training, he felt data wasn’t the only culprit.

“There are a lot of ways that bias can slip into the system. Who knows if it’s the training procedure, the model architecture or the setting in which you deploy it,” he said, adding that just as cybersecurity requires defense in depth, battling bias requires monitoring all layers of a language model.

When it comes to reducing hallucinations, many developers turn to retrieval augmented generation systems, which Reddy said perform better on LLMs. Frankle disagreed that model scale impacted the frequency of hallucinations, saying the issue was more about data quality. Fine-tuning the data so that the model has contextual information to pull from should limit hallucinations, he said.

Baxter added that another solution is teaching the model to recognize when it doesn’t know something, so that it can inform a user it doesn’t have the right information rather than making things up.

The Future of Small Models

The panel agreed that AI models were trending lighter, with smaller models gaining in efficiencies—which could have big implications for mobile applications.

“You’ll be able to get more capacity onto mobile devices without also requiring so much power that you run out of battery,” said Frankle.

As optimizing compute power, cost and speed become increasingly important, Reddy said she expects the orchestration model—whereby a large model analyzes each task, breaks it into component parts and routes it to smaller, specialized SLMs—will gain in popularity.

One thing all panelists agreed on is that we’re still in the early stages of knowing what’s possible with SLMs, though Frankle said this wasn’t unique to those models:

“I think I can answer any question you might ask about our understanding of [generative AI] by saying, ‘We’re still at the beginning.’”