New AI search products from OpenAI and other industry leaders are prompting news companies to reconsider potential deals with AI firms that require news content to answer real-time queries about current events.
The Big Picture: Negotiations between tech and news industries over AI have largely focused on providing data for training large language models (LLMs). Now, talks are shifting to address narrower use cases where news publishers may have more leverage. LLMs can be trained on a wide array of texts, but accurately answering queries about current events requires access to smaller pools of vetted, real-time information.
The process of providing answers based on specific data sets is called Retrieval Augmented Generation (RAG). RAG helps make LLMs more accurate and reduces — but doesn't eliminate — hallucinations or incorrect answers.
Driving the News: The recent rollout of OpenAI's SearchGPT and Microsoft's Bing generative search product highlighted evolving partnerships between Big Tech firms and news publishers as LLM makers integrate more RAG-based approaches into their products.
- OpenAI is currently testing SearchGPT with several news publishers, including The Atlantic and News Corp. SearchGPT's answers feature "clear, in-line, named attribution and links" so users can see where the information comes from and access more results via sidebar links.
- Publishing partners can manage how their content appears in SearchGPT.
Yes, But: It needs to be clarified if new generative AI-powered search engines will provide publishers as much revenue as traditional search engines, mainly Google's, did. The old model of sending traffic to publishers via search page links has been effective for the past 20 years. OpenAI is experimenting with a revenue-sharing model for creators within its GPT store. However, deals with news publishers for RAG are built on licensing fees, not revenue shares. Smaller startups like Tollbit are attempting to create marketplaces for revenue sharing between AI firms and news companies. However, these require broad participation to succeed.
The Big Picture: AI firms argue it's legal to train their models on anything publicly available online. Still, many publishers believe their content is protected under copyright law. The New York Times' lawsuit against OpenAI and Microsoft may clarify this issue, but the case could take years. Meanwhile, news firms and AI companies are negotiating RAG deals to avoid new copyright conflicts. Notably, OpenAI states that news sites can appear in SearchGPT results even if they opt out of generative AI training.
What to Watch: While OpenAI seems eager to make deals with newsrooms, other AI companies are hesitant. Anthropic hasn't confirmed any publisher deals or plans to make them. Meta is debating its approach, with some executives, including CEO Mark Zuckerberg, skeptical of media deals due to past experiences. Others believe Meta will need agreements with news providers for its AI chatbot, MetaAI, to be accurate. Meta declined to comment. Perplexity is seeking publishing partners for a new pilot program. Still, media firms have objected to its interim use of content. Forbes has threatened legal action against Perplexity, and Condé Nast sent a cease and desist letter over data scraping.