OpenAI Hustles to Beat Google to Launch ‘Multimodal’ LLM Read more

Art generated by Midjourney.

Author Paranoia About AI Finds New Targets

Photo: Art generated by Midjourney.

AI founders need to watch out. It’s no secret that publishers and writers are sensitive about large-language models using their data for training. But just how sensitive became clear in the recent shuttering of a six-year-old publishing dataset called Prosecraft.

The service, created in 2017 by Benji Smith, founder and CEO at word processor Shaxpir, was intended to be a helpful resource to authors. It ranked titles based on how passive or vivid their language is. For instance, it granted Lewis Carroll’s “Alice’s Adventures in Wonderland” a 83.94% “vividness” score. One issue: it scraped those books from the web without the authors’ permission. Even so, people didn’t seem to pay it much mind until the rise of generative AI, which has raised concerns around LLMs training on copyrighted material. 

While Prosecraft itself has little to do with LLMs, the sheer amount of easily-accessible textual data it offered rang alarm bells with writers like “Little Fires Everywhere” author Celeste Ng. They worried that Prosecraft could be used by LLMs for training purposes. A few worried tweets this weekend from writers who discovered the site later kicked off a firestorm on X (formerly known as Twitter). Smith quickly buckled, taking down the dataset. Prosecraft had ingested 27,000 books up until that point. Smith did not respond to a request for comment.

Access on the go
View stories on our mobile app and tune into our weekly podcast.
Join live video Q&A’s
Deep-dive into topics like startups and autonomous vehicles with our top reporters and other executives.
Enjoy a clutter-free experience
Read without any banner ads.
OpenAI's Greg Brockman (left) and Google's Demis Hassabis (right). Photos by Getty.
AI Agenda google ai
OpenAI Hustles to Beat Google to Launch ‘Multimodal’ LLM
As fall approaches, Google and OpenAI are locked in a good ol’ fashioned software race, aiming to launch the next generation of large-language models: multimodal.
From left, a Google TPU, Broadcom CEO Hock Tan and Google Cloud chief Thomas Kurian. Photos via Getty, Google and YouTube.
Exclusive google semiconductors
To Reduce AI Costs, Google Wants to Ditch Broadcom as Its TPU Server Chip Supplier
Google executives have extensively discussed dropping Broadcom as a supplier of artificial intelligence chips as early as 2027, according to a person with direct knowledge of the effort.
Art by Clark Miller.
space Twitter
The Trouble With Walter: In His Elon Musk Tome, the Writer Shows Us the Perils of Access Journalism
Walter Isaacson is the exotic bird of American letters, a charming and convivial bon vivant and raconteur, the life of many a dinner party, a studious biographer and a generous mentor.
Flexport founder Ryan Petersen. Photos via Getty and Flexport.
Can Ryan Petersen Fix Flexport?
Ryan Petersen was getting antsy. This March, Petersen had handed over the CEO job at Flexport—the logistics company he’d founded a decade earlier, which had ballooned to an $8 billion valuation in 2022—to veteran Amazon executive Dave Clark.
Photos via Eiso Kant (left) and YouTube/VMWare Tanzu (right)
AI Agenda startups ai
How GitHub Copilot’s Co-Creator Raised $126 Million to Compete with His Former Employer
Recent interest in artificial intelligence has focused on large-language models that aim to do everything from writing Shakespearean poetry to solving math riddles.
Art by Mike Sullivan.
The Flexicon culture
A Is for Adaptogens, B Is for Body Sculpting: A Trending-in-Silicon Valley Health Glossary
Last month, The Information Weekend conducted our first-ever Brain-Body Investment Survey , asking subscribers about their exercise, wellness and beauty practices.