Reddit CEO: Microsoft and other AI search engines must pay to use ‘our data’

Reddit CEO Steve Huffman has insisted that the social media platform will continue to block AI companies, including Microsoft, from scraping data on its site until it is paid and has a say in how the content is used. He said Reddit will not compromise on the unlicensed use of its data to train AI models, according to The Verge.

Over the last few months, Reddit has made changes to its policy in an attempt to prevent AI developers from scrapping its user data, posts, and communities without consent or payment. The company has since concluded a deal worth $60 million with Google, allowing the tech giant to use its content. Reddit made a similar agreement with ChatGPT-maker OpenAI in May.

Microsoft profits off Reddit’s free content

However, Microsoft has continued to use Reddit’s content to build the AI features in its Bing search engine without permission, Huffman alleged. The Reddit CEO accused Microsoft of profiting off his firm’s content. He said Microsoft scraped the data for free but sold it to AI entities via Bing API for profit.

Eventually, Reddit blocked Microsoft from accessing its user data – meaning Bing could no longer surface Reddit content in its search results. Reddit leverages Robots Exclusion Protocol, or robots.txt, a tool used by websites to identify web crawlers accessing the site and prevent unauthorized use of their data. In a recent interview with The Verge, Huffman stated:

“We’ve had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use. That’s their real position.”

Huffman revealed that the three companies – Microsoft, Anthropic and Perplexity – and other smaller AI firms have refused to negotiate payment for scraping Reddit’s content. The entities typically argue that the data is publicly available information and can be used under fair use principles. Salesforce previously defended its use of YouTube content on the same grounds. Apple has also said it trained its AI on publicly available data.

“Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for,” Huffman said. “[This] has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used.”

Microsoft boss says web content is ‘freeware’

A Microsoft spokesperson said that the company “respects” the robot.txt protocol and stopped crawling Reddit on July 1. Meanwhile, Mustafa Suleyman, CEO of Microsoft AI, recently appeared in the press describing Reddit’s content as freeware.

“…with respect to content that is already on the open web, the social contract of that content since the ’90s has been that it is fair use,” he detailed. “Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That’s been the understanding.”

U.S. copyright laws allow for the re-use of published content. The United States government copyright website says it is allowed to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports as these all fall under the doctrine of fair use. However, search engines are not compelled to compensate publishers.