Exclusive: Andreessen Horowitz General Partner Midha to Establish Compute FirmSave 25% and learn more

The Information
Sign inSubscribe

    Data Tools

    • About Pro
    • The Next GPs 2025
    • The Rising Stars of AI Research
    • Leaders of the AI Shopping Revolution
    • Enterprise Software Startup Takeover List
    • Org Charts
    • Sports Tech Owners Database
    • The Information 50 2024
    • Generative AI Takeover List
    • Generative AI Database
    • AI Chip Database
    • AI Data Center Database
    • Cloud Database
    • Creator Economy Database
    • Creator Startup Takeover List
    • Tech IPO Tracker
    • Tech Sentiment Tracker
    • Sports Rights Database
    • Tesla Diaspora Database
    • Gigafactory Database
    • Pro Newsletter

    Special Projects

    • The Information 50 Database
    • VC Diversity Index
    • Enterprise Tech Powerlist
    • Kids and Technology Survey
  • Org Charts
  • Tech
  • Finance
  • Weekend
  • Events
  • TITV
    • Directory

      Search, find and engage with others who are serious about tech and business.

    • Forum

      Follow and be a part of discussions about tech, finance and media.

    • Brand Partnerships

      Premium advertising opportunities for brands

    • Group Subscriptions

      Team access to our exclusive tech news

    • Newsletters

      Journalists who break and shape the news, in your inbox

    • Video

      Catch up on conversations with global leaders in tech, media and finance

    • Partner Content

      Explore our recent partner collaborations

      XFacebookLinkedInThreadsInstagram
    • Help & Support
    • RSS Feed
    • Careers
  • About Pro
  • The Next GPs 2025
  • The Rising Stars of AI Research
  • Leaders of the AI Shopping Revolution
  • Enterprise Software Startup Takeover List
  • Org Charts
  • Sports Tech Owners Database
  • The Information 50 2024
  • Generative AI Takeover List
  • Generative AI Database
  • AI Chip Database
  • AI Data Center Database
  • Cloud Database
  • Creator Economy Database
  • Creator Startup Takeover List
  • Tech IPO Tracker
  • Tech Sentiment Tracker
  • Sports Rights Database
  • Tesla Diaspora Database
  • Gigafactory Database
  • Pro Newsletter

SPECIAL PROJECTS

  • The Information 50 Database
  • VC Diversity Index
  • Enterprise Tech Powerlist
  • Kids and Technology Survey
Deep Research
TITV
Tech
Finance
Weekend
Events
Newsletters
  • Directory

    Search, find and engage with others who are serious about tech and business.

  • Forum

    Follow and be a part of discussions about tech, finance and media.

  • Brand Partnerships

    Premium advertising opportunities for brands

  • Group Subscriptions

    Team access to our exclusive tech news

  • Newsletters

    Journalists who break and shape the news, in your inbox

  • Video

    Catch up on conversations with global leaders in tech, media and finance

  • Partner Content

    Explore our recent partner collaborations

Subscribe
  • Sign in
  • Search
  • Opinion
  • Venture Capital
  • Artificial Intelligence
  • Startups
  • Market Research
    XFacebookLinkedInThreadsInstagram
  • Help & Support
  • RSS Feed
  • Careers

Answer tough business questions, faster than ever. Ask

Partner Content

Recap: Why Companies Are Betting on Small AI Models

Recap: Why Companies Are Betting on Small AI Models
By
The Information Partnerships
[email protected]Profile and archive

Ever since ChatGPT exploded, tech companies have been racing to replicate its success with their own large language models, from Google’s Gemini to Meta Platforms’ Llama. But bigger isn’t always better. Small language models promise enterprises less lag, lower costs and a smaller environmental footprint.

Amir Efrati, executive editor of The Information, sat down with three artificial intelligence developers to discuss the advantages of leveraging SLMs, how they’re primed to tackle AI’s shortcomings, and what the future might hold for AI models going forward:

  • Kathy Baxter, principal architect of ethical AI practice, Salesforce
  • Jonathan Frankle, chief neural network scientist, Databricks
  • Bindu Reddy, CEO and co-founder, Abacus AI

The Advantage of Going Small

Language models are measured by parameters—the numerical values learned during training. Unlike LLMs, which can have upward of 30 billion parameters, SLMs are trained on much smaller datasets and have simpler functionality, meaning they can operate with fewer parameters. This, said Kathy Baxter, means they use far less computing power.

“Small models don’t scrape the web. The result is a model that requires a whole lot less data and, more importantly for me, it’s better for the environment,”  Baxter said.

The computing power required to run LLMs is also becoming prohibitively expensive, noted Bindu Reddy. She pointed to the debut this week of Meta’s Llama 3.0 405B model.

“Everywhere has been trying to get hold of GPUs because of this release,” she said. “But even if you have the money, you can’t get it.”

A Model for Every Occasion

One of the big selling points of small models is their versatility. The panelists agreed that the ability to customize SLMs to perform specific tasks is what makes them such a powerful tool. At Salesforce, Baxter said she prefers employing different small models for different tasks.

Jonathan Frankle added that smaller models use more-manageable datasets, providing more transparency and alleviating some of the black box issues of larger LLMs.

Referencing how Databricks customers use the platform to train SLMs, he said:

“They can train it on exactly the data they want to train on, and they know exactly what data went into that model. [With SLMs] you can build exactly what you need, nothing more.”

When asked about their favorite SLM models, Baxter called out Google Gemma and Reddy mentioned Mega’s newly released Llama 3.1 has some impressive metrics. Frankle preferred not to play favorites, saying that with small models you don’t need to.

Tackling AI’s Shortcomings

When asked if smaller models can help mitigate some of AI’s more well-known challenges, like bias and hallucinations, Frankle said while they could theoretically reduce bias by offering better control of the data used for training, he felt data wasn’t the only culprit.

“There are a lot of ways that bias can slip into the system. Who knows if it’s the training procedure, the model architecture or the setting in which you deploy it,” he said, adding that just as cybersecurity requires defense in depth, battling bias requires monitoring all layers of a language model.

When it comes to reducing hallucinations, many developers turn to retrieval augmented generation systems, which Reddy said perform better on LLMs. Frankle disagreed that model scale impacted the frequency of hallucinations, saying the issue was more about data quality. Fine-tuning the data so that the model has contextual information to pull from should limit hallucinations, he said.

Baxter added that another solution is teaching the model to recognize when it doesn’t know something, so that it can inform a user it doesn’t have the right information rather than making things up.

The Future of Small Models

The panel agreed that AI models were trending lighter, with smaller models gaining in efficiencies—which could have big implications for mobile applications.

“You’ll be able to get more capacity onto mobile devices without also requiring so much power that you run out of battery,” said Frankle.

As optimizing compute power, cost and speed become increasingly important, Reddy said she expects the orchestration model—whereby a large model analyzes each task, breaks it into component parts and routes it to smaller, specialized SLMs—will gain in popularity.

One thing all panelists agreed on is that we’re still in the early stages of knowing what’s possible with SLMs, though Frankle said this wasn’t unique to those models:

“I think I can answer any question you might ask about our understanding of [generative AI] by saying, ‘We’re still at the beginning.’”

Most Popular

  • ExclusiveCould Apple and Musk’s SpaceX Finally Do a Satellite Deal?
  • ExclusiveAI Enigma: Search Traffic Drops to Sites, but Revenue Doesn’t—Yet
  • The Big ReadOpenAI Readies Itself for Its Facebook Era
  • OpinionCan AI Deliver What Shoppers Want?

Recommended