Reddit sues Perplexity for scraping of posts, expanding user data battle with AI industry

Stick Pictures | Light Rocket | Getty Images
social media giant reddit It has filed a lawsuit against AI company Perplexity, alleging it illegally deleted user posts to train its AI model, marking the latest data rights conflict between content owners and the AI industry.
The complaint, filed in New York federal court on Wednesday, also names three defendants that Reddit says helped Perplexity collect its data: Lithuanian data scraper Oxylabs, “former Russian botnet” AWMProxy, and Texas startup SerpApi.
Reddit claimed that three small organizations were able to extract its copyrighted content by “masking their identities, hiding their locations, and disguising their web scrapers as ordinary people.”
Perplexity, which operates an AI-powered search engine, denied the allegations and accused Reddit of “extortion” and opposition to the open internet; SerpApi, on the other hand, told CNBC that he “strongly disagrees” with Reddit’s allegations and intends to defend himself in court.
The case represents one of many lawsuits filed by content owners accusing AI firms of using copyrighted materials without permission to train large language models. Reddit, in particular, has been at the forefront of this battle, launching a similar ongoing lawsuit against AI startup Anthropic in June. CNBC was unable to reach Oxylabs and AWMProxy.
Reddit Chief Legal Officer Ben Lee said in a statement shared with CNBC that AI companies are “locked in an arms race for quality human content” and that the pressure is fueling “an industrial-scale ‘data laundering’ economy.”
Scrapers bypass technological protections to steal data and then sell it to customers hungry for educational material. Reddit is a prime target because it is one of the largest and most dynamic collections of human conversations ever created.
Reddit, which is home to a community of more than 100,000 interest-based “subreddits,” said in its lawsuit that user posts had become the most cited source for AI-generated replies on Bewilderment.
He added that he sent a cease and desist letter to Perplexity and then increased the volume of quotes to Reddit “forty-fold.”
AI researchers have previously noted that Reddit’s large number of moderated conversations could help AI chatbots produce more natural-sounding responses.
In the age of AI, Reddit has sought to leverage its massive data pool, allowing access to it only through AI-related licensing agreements. The social media company has signed such agreements with OpenAI and AlphabetIt is Google.
In response to the lawsuit, Perplexity argued in a post on the Reddit platform that it did not train its AI models on the content, but merely summarized and quoted public Reddit discussions. For this reason, it was stated that it was “impossible” to sign a license agreement.
“A year ago, after we disclosed this, Reddit insisted we still pay even though we had legal access to Reddit data. Giving in to strong-arm tactics is not the way we do business,” the statement said, describing the case as “a show of force in Reddit’s education data negotiations with Google and OpenAI.”
Noting that data licensing has become an increasingly important source of revenue for Reddit, Perplexity believes this is a sad example of what happens when publicly available data becomes a large part of a publicly traded company’s business model. he added.
In February, Reddit COO Jen Wong said: trade publication Adweek AI licensing deals with Google and OpenAI account for about 10% of Reddit’s revenue, he said.




