Credit: Mateusz Slodkowski/SOPA Images/LightRocket via Getty Images

OpenAI and photo generator Midjourney intend to pay for utilizing public Tumblr content to train their AI models, as per reports from 404 Media based on internal documents.

404 Media disclosed that a forthcoming deal between Automattic, the parent company of Tumblr, and the two AI entities is on the horizon, although the specific types of data to be sold to each party were not specified. The agreement purportedly involves the sale of data from WordPress.com, another platform owned by Automattic.

On Feb. 27, posts were made on the staff blogs of Tumblr and WordPress.com detailing how user-generated content is used for AI training. However, these posts did not inform users about Automattic’s discussions to sell such data.

Here’s a breakdown of how this potential sale could impact your content on Tumblr.

SEE ALSO:

Tumblr CEO’s public ‘meltdown’ is mocked, memed by users

Which content will Automattic reportedly sell?

According to 404 Media, the reviewed documents did not specify the exact nature of the data to be sold to each party. It’s also ambiguous whether this deal pertains solely to future Tumblr posts or if it encompasses past content as well. AI companies have faced criticism for extensively utilizing “publicly available” content to train their models, given that much of the online content publicly available remains subject to copyright laws.

As stated in a support article on OpenAI’s website, “ChatGPT and our other services are developed using information that is publicly available on the internet,” along with other sources. Presumably, OpenAI has already scraped and utilized any content that was once publicly accessible on Tumblr. Consequently, the current deal could be a form of acknowledgment by OpenAI and Midjourney as they now offer compensation for the use of all forthcoming Tumblr content as well.

404 Media approached Automattic for comment on the deal, but did not receive a response. However, Automattic posted a statement titled “Protecting User Choice,” in which the company mentioned, “We currently block, by default, major AI platform crawlers—including ones from the biggest tech companies—and update our lists as new ones launch.” The timeline for when the site initiated the blocking of these crawlers remains unclear, especially given that OpenAI has been training its algorithm on public content for years.

SEE ALSO:

The inside story of how Tumblr lost its way

How do I opt out?

To prevent the sharing of your public Tumblr content with third parties, you must activate a new “Prevent third-party sharing” option within the settings of each individual blog you manage. This action should be performed on a web browser, not through the Tumblr app. These updates have been included in Tumblr’s user privacy support article.

If you had previously chosen to restrict the searching of your blog, the new “prevent third-party sharing” option will be automatically activated by default.

However, if you decide to delay activating the setting now and opt to do it in three months instead, 404 Media mentioned a document from Feb. 23 where a Tumblr staff member raised a question on this matter. They asked, “Do we have assurances that if a user opts out of their data being shared with third parties, our existing data partners will be notified of such a change and remove their data?”

In response, Automattic’s head of AI, Andrew Spittle, stated, “We will inform existing partners regularly about users opting out… We aim to make this an ongoing process, advocating for past content to be excluded based on current preferences. We will request the removal of content from future training processes. We believe our partners will comply based on our discussions with them so far.”

Is this normal?

Seemingly, this is at least becoming the new standard. OpenAI is licensing news content from the Associated Press and reportedly in discussions to do the same with CNN, Time, and Fox. Reddit is collaborating with Google to commercialize its content database.

It was only a matter of time before Automattic began selling its data, especially given the financial challenges Tumblr is facing. Throughout its 17-year existence, Tumblr has never achieved profitability, and Automattic has struggled to reverse this trend. In November, TechCrunch reported that resources were redirected from Tumblr to support other projects within Automattic.

Topics
Artificial Intelligence
Tumblr

Mashable Image

Elizabeth de Luna
Culture Reporter

Elizabeth is a digital culture reporter covering the internet’s influence on self-expression, fashion, and fandom. Her work explores how technology shapes our identities, communities, and emotions. Before joining Mashable, Elizabeth spent six years in tech. Her reporting can be found in Rolling Stone, The Guardian, TIME, and Teen Vogue. Follow her on Instagram here.

Shares:

Leave a Reply

Your email address will not be published. Required fields are marked *