Inside the Art Automaton

This article was first published on LinkedIn in June 2023. It has been edited and updated in part and republished here in January 2024.

In my previous article about generative AI art, I made the following claim:

My work was taken. They went to my website and found images of my prints and they used those images to train their programmes. And now those programmes are being used by people who are positioning themselves in direct competition with me.

But what does it mean for me to say that they “took my work”? And who are “they”?

The answers are much more complicated than I first anticipated.

To better understand how these companies used my work and the work of millions of other artists and creatives, I wanted to follow the journey of one illustration from my website all the way into the belly of the beast.

But first, a disclaimer

I am an illustrator. I am not an expert in generative AI (I had not even heard of it until late 2022). As such, I approached this as a journey of discovery; a way of finding out how my artwork ended up inside these art automatons.

The journey I describe is what I – an artist affected by all of this – have unearthed about how that process works. I have undoubtedly missed a lot of details, not understood things, and oversimplified a very complicated and clever process. This is my best attempt at understanding how all of this works.

People from both sides of the debate have read this article and I have incorporated their feedback. However, if you are an expert and you feel I am misrepresenting anything then please let me know. I didn’t want to learn all about this topic but, now that I find myself quite far down the rabbit hole, I would like to know if I have taken any wrong turns.

The journey begins

In February of 2021 I uploaded this image to my website.

Steampunk victorian gentleman top hat goggles character illustration

It is a design I created and uploaded to my website so that I could promote my character illustration work. I own this image. I painted it and the copyright in the image belongs to me.

When it was uploaded it was automatically given its own web address (URL): Visitors to my site will generally never visit that URL. Instead, they will view the image as one of several designs shown on my character art page.

When I uploaded it, I added this alt-text to the image: “Steampunk victorian gentleman top hat goggles character illustration”. (I recognise that this is not a good example of what should be used as alt-text.)

Common crawl

At a point sometime after February 2021, a crawl bot visited and collected data about my site. One of the things it recorded was the URL of that image and the associated alt-text.

The crawl bot was owned by Common Crawl, one of a number of companies sending bots across the internet to visit websites and catalogue what they find. They are an American non-profit set up with “the goal of democratizing access to web information” 1.

“Everyone should have the opportunity to indulge their curiosities, analyze the world and pursue brilliant ideas. Small startups or even individuals can now access high quality crawl data that was previously only available to large search engine corporations.” 2

Web crawlers are common on the internet. Data catalogued by crawl bots is what allows our Google overlords to serve up search result information when we type something into a search engine 3.

Common Crawl is a little different from Google. They are cataloguing websites in order to build up a repository of data that can be used by a range of different organisations for a range of different purposes. And, sometime after February 2021, the URL of my illustration ended up in their catalogue of website data along with the accompanying alt-text.

Next, to Germany

One organisation to make use of Common Crawl’s catalogue of website data was a German non-profit named LAION. Their principle goal:

“Releasing open datasets, code and machine learning models. We want to teach the basics of large-scale ML research and data management. By making models, datasets and code reusable without the need to train from scratch all the time, we want to promote an efficient use of energy and computing ressources (sic) to face the challenges of climate change.” 4

LAION wanted to build a vast dataset of image and alt-text pairs that they would, in turn, make available to other researchers interested in developing large machine learning models (the things that have become colloquially known as “AI”).

They took a catalogue of data from Common Crawl that contained around 50 billion URL and alt-text pairs. (That’s right: 50 billion!) One of these was the URL and alt-text for my illustration. They ran tests on the data to check whether the alt-text bore any similarity to the image itself and, having performed these checks and cleaned up the data, they were left with somewhere in the region of 6 billion URL and alt-text pairs 5. They named this dataset LAION-5B.

To complete their data checks, LAION downloaded a copy of my illustration, but this was deleted again once they had finished their checks, leaving them with a dataset full of URLs and alt-text. But LAION needed a way for users to be able to do something with all the information held in the dataset. They needed a way for users to download and use the actual images. So they built a tool to do this 6.

The img2dataset tool will take a catalogue of data (such as the LAION-5B dataset) and will save a copy of every image to your computer along with a text file containing the alt-text. Admittedly, it will take about a week to download all 6 billion images in the dataset, but for organisations working on machine learning models, I imagine that is time and resources well spent.

This tool is publicly available, as is the LAION-5B dataset, so it is theoretically possible for anyone with the time and resources to download all 6 billion image and alt-text pairs, including my illustration.

I wanted to mention here that, since this article was first written, it has been revealed that the checks LAION completed did not manage to find nor remove the significant number of CSAM images that had been collected within the Common Crawl catalogue. As of January 2024, the LAION datasets have been temporarily removed and are unable to be downloaded, while the organisation attempts to clean them of CSAM images. This revelation is really worrying and highlights the seemingly slapdash nature by which these datasets were constructed (not to mention the potential consequences for those people who have downloaded CSAM images to their computers as part of a LAION dataset).

Is this allowed?

Is it legal for just anyone to vacuum up huge swathes of data from websites? It seems that it is.

In 2019 LinkedIn brought a case against a rival company, hiQ Labs, who were scraping data from LinkedIn users and using it to analyse employee attrition. In 2022 the US courts concluded that hiQ Labs was legally allowed to take this data from LinkedIn. The rationale for this decision was that the data was publicly available, ie LinkedIn users were sharing the data on their public profiles and you didn’t need to log in to LinkedIn to view it 7.

With this precedent in mind, it appears that it is legal for anyone to catalogue publicly available information from the internet. That illustration I uploaded to my website is publicly available; you don’t need to log in to my website to view it. But while it is legal for someone to come along and catalogue my website, it doesn’t follow that it is legal for them to do whatever they like with the data once they have collected it. There are still protections in place. I still own the copyright in that image, after all.

Various legal commentators have written their summaries of the hiQ case and they have all been at pains to mention that the above decision does not grant free reign for companies to use the data they (legally) catalogued for any purpose they like.

“While the Ninth Circuit’s holding in hiQ indicates that Clearview AI and other companies that have adopted scraping data as a business model can avoid criminal liability under [Computer Fraud and Abuse Act], web scrapers may still face legal challenges under state law regimes and intellectual property laws.” 8

For example, the Information Commissioner’s Office (the body that oversees data protection in the UK) recently fined Clearview AI £7.5m for collecting “more than 20 billion images of people’s faces and data from publicly available information on the internet and social media platforms all over the world to create an online database” 9. While that is a different scenario from the one we are exploring, it highlights that there are still restrictions on what you are allowed to do with any data you have collected from the internet.

For their part, LAION are very clear that they only hold a catalogue of internet data; they do not hold copies of the images themselves and are therefore not impacted by copyright law:

Does LAION datasets respect copyright laws? LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in.” 10

Which poses the question: have any researchers made use of the dataset and downloaded all 6 billion images?

Yes. Of course they have.

My illustration now makes the short journey to Munich, to the Machine Vision and Learning Group at Ludwig Maximilians University.

Researchers at LMU downloaded the LAION-5B dataset to use in the development of their new latent text-to-image diffusion model. (A diffusion model is what we have come to know as a generative AI art programme; see my earlier article for an introduction to these programmes.)

Working with the researchers at LAION, they took the 6 billion images and ran further checks and tests to curate a subset of more aesthetically-pleasing images. They called this dataset LAION-Aesthetics V2 and it contained around 600 million images 11.

My illustration was included in the 6 billion images making up LAION-5B:

screenshot from showing my illustration and the source as
Search result on showing my illustration – this website allows you to search through the images in the LAION-5B dataset (please approach with caution; there are so many NSFW images it’s scary)

It also made the cut for the smaller LAION-Aesthetics V2 subset (you can search a small portion of it, about 2%, online).

My illustration – that I own, remember – has had quite a journey already:

  • Catalogued by a crawler bot from non-profit company Common Crawl
  • Passed to the non-profit company LAION and collated into a vast dataset called LAION-5B
  • Assessed and deemed aesthetically-pleasing enough to make it into the smaller dataset called LAION-Aesthetics V2
  • Downloaded by the research team at the Machine Vision and Learning Group at LMU (also known as “CompVis”)

So what did the team at Machine Vision and Learning Group do with my illustration and the other 599,999,999 images making up the LAION-Aesthetics V2 dataset?

We need to talk about Stable Diffusion

The “latent text-to-image diffusion model” built by the team at LMU is the programme we now know as Stable Diffusion. It was built by the researchers at LMU with support from other organisations, one of which was a startup called Stability.AI.

As far as I can tell, Stability.AI’s involvement was limited to providing the servers on which the model was trained 12. It takes a lot of computer processing power to train these models, it seems. I mention them specifically because Stability.AI have since taken over the running of Stable Diffusion, in a series of events I don’t claim to fully understand.

What did the researchers at LMU do with all of those servers? They used them to train their new diffusion model (Stable Diffusion) so that it could automate the process of creating images.

Let’s make some noise!

The training process is a complicated task that I have tried to explain in a simplified way below. This is the crucial step in understanding how my illustration was absorbed into Stable Diffusion itself and I have opted for a simplified – but, I believe, still accurate – explanation because there is a level of technical detail that (a) I don’t fully understand and (b) is not really useful in understanding the broader training process they went through. If you are interested in the deeper technical details, I would recommend the paper put out by LMU.

On their own, diffusion models are just code. They can’t do anything. They cannot generate images. In order to turn a diffusion model into something that can generate images, to turn it into something useful, it needs to be trained. And the way to train a diffusion model is to feed it vast numbers of images, allow it to destroy them, and then let it attempt to rebuild them.

The way it “destroys” an image is by adding noise to it. If you aren’t familiar with the concept of image noise, think of it like film grain or the static on a TV that is not properly tuned. If you were born after 1995 and neither of those references make sense then I’m sorry, I can’t help you.

If you add noise to an image then you make it a little less crisp, a little more speckled. If you do this enough times then you arrive at an image that is completely unrecognisable and is just a mess of static.

4 step process showing images with different levels of noise
Image source:

Above is an example series of images showing the de-noising process. The process of adding noise is the reverse of this, starting with the clear image on the right and gradually adding noise until it becomes completely unrecognisable (the image on the left).

As part of the training process, the researchers took my illustration and fed it into Stable Diffusion along with the alt-text. The model started by destroying the image, repeatedly adding noise to it until it was an unrecognisable mess. The model then attempted to reverse that process, gradually removing a little bit of noise over and over again until it got back to an image that looked like my original illustration.

Having successfully recreated my illustration, the model had “learnt” how to take an unrecognisable image full of noise and, step-by-step, remove little bits of noise until it arrives at an image that is recognisable. By linking this process to the alt-text that accompanied my illustration, the model has “learnt” how to do this in a way that results in an image featuring a “steampunk” “man” in a “top hat” and “goggles” rendered as an “illustration”.

Over and over and over again

Now imagine the model going through that process for 600 million different images. Taking an image and turning it into unrecognisable noise, then unwinding the process to get back to where it started, over and over again, 600 million times. And each time it does it, the model remembers what it did: how to remove noise in such a way that the end result is an image that looks like the thing described in the alt-text.

My illustration is no longer part of the model. The model destroyed my illustration, rebuilt it, and it remembers how to do it. Having gone through that process 600 million times, you end up with a model that no longer needs the original images.

When it is used to generate new images, it can start at the halfway point: it can take a noisy image and a string of text and it can remove noise from it step-by-step until it arrives at an image that is recognisable. It has stored away the steps it needs to go through in order to get from a noisy image to an image depicting an illustration of a Victorian steampunk gentleman in top hat and goggles.

And that is how these generative AI art programmes – these diffusion models – work, how they generate images. A user types in a prompt (a string of text), the programme generates a random image filled with pure noise (the “seed” image), and the programme then goes step-by-step through the process of removing little bits of noise from the image until it arrives at something that is recognisable as the thing that was written in the prompt.

They have automated the process of creating art.

The above process describes how one of my illustrations was taken from my website (without my knowledge or consent) and the various steps by which it ended up being used in the creation of Stable Diffusion. Those same datasets, produced by LAION, have been used in the creation of other generative AI are programmes, including Midjourney. While the above information did not come in for any criticism or corrections from early readers of this article on LinkedIn, it became clear that the conclusions I drew from these discoveries were not properly articulated. Some people were keen to project their own conclusions onto my article; things that were not what I had intended to imply. As such, I have laid out below my three key takeaways from the discoveries made above.

Conclusion 1: Diffusion models are nothing without their training data

A diffusion model that hasn’t been trained is a diffusion model that cannot generate images.

If you are a company wanting to sell a product to users, then that product needs to be usable. While there is no doubt that a lot of hard work and innovation goes into the creation of the model itself, you cannot sell an “empty” diffusion model to your customers. It won’t do anything.

In order to make their product usable, companies such as Stability.AI or Midjourney need to train their diffusion models. That is what adds value to their product. It is what makes their product (their generative AI art programmes) attractive to users. These companies had choice about how they went about this process. For example, they could have:

  • Used only public domain images or images released under Creative Commons; and/or
  • Licenced images from the image owners (the artists, photographers, marketeers etc) and used those

Arguably, these approaches would have created “better” datasets. You are not going to end up with CSAM images in your dataset if you take these options, and you may remove some of the biases that would otherwise be present. (That’s a discussion for another article.)

They did not take either of these approaches, however. They chose the third option: scraping as much as they could from the internet and using that.

I want to emphasise what an opaque and convoluted journey my poor steampunk character was taken on. Is it really right that someone can collect data from websites in the name of research, hand it off to another company to parcel it up (again in the name of research), before passing it to a University research team to use in the creation of a model, ostensibly in pursuit of academic research? Maybe. Several different jurisdictions have exemptions from copyright laws for work carried out for research purposes 13.

Is it right that the result of all of that academic, not-for-profit, research activity can somehow morph into a generative AI art programme now controlled by a for-profit company, a company receiving billions of dollars in investment capital 14?

That doesn’t sit well with me. If a company is creating a product, and they choose to use the work of other people in order to create that product, then I believe those people should consent to that happening and be appropriately compensated for it.

Conclusion 2: Generative AI art programmes do not learn like humans

I would like to offer the process described above as a refutation to the claim that “AI models learn just like human artists”. Clearly they don’t. No artist learnt how to paint or draw by vacuuming up billions of images from the internet before systematically destroying them and working out how to “undestroy” them again.

If I simply wanted to look at each of the images in the LAION-Aesthetics V2 dataset, and assuming I could look at one image every second, it would take me over 19 years to get through all 600 million. Without breaks. And I would need to take breaks.

Even if you consider the well-trodden path that many artists take of making studies of classical paintings (eg painting a copy of a Turner or a Vermeer), that is still not comparable to the way in which these models were trained. A human artist makes a study of a painting in order to understand how the original artist created that image, how they used a spot of paint here to create the illusion of light or how they composed the scene to draw the viewer’s eye from one part to another. By making your own copy you hope to learn a little bit of their technique, which you can then apply to your own art.

The model knows nothing about the techniques used to create the art it ingested. It doesn’t understand anything about what is depicted in the image. Being able to recognise a cat as distinct from a chair or a lamp does not mean it understands anything about what a cat is, what a lamp is used for, how a chair works. Destroying a Vermeer does not tell the model anything about how Vermeer painted. It just knows how to remove noise from an image, step-by-step, until it arrives at something that has the shape of a Vermeer.

Conclusion 3: Gone but not forgotten?

This is an open question more than a firm conclusion.

It is undeniable that the Stable Diffusion model contains no images. When it generates a new image it is not piecing together bits and pieces of existing images into a collage. As we have seen, it generates images by starting from an image made of pure noise and slowly removing noise from it.

But I believe that the ghost of my illustration lives on inside Stable Diffusion. The model “remembers” the process it went through in order to get from an image made of noise to an image that looks like my illustration. If I gave it the right set of inputs, would I be able to get Stable Diffusion to generate my illustration anew?

Researchers from the University of Maryland and New York University wondered the same thing. They tested whether they could get Stable Diffusion (and other generative AI programmes) to generate images that were similar to images used during the training process; what they call “content replication”.

On one hand they found that the larger the dataset, the less content replication they observed, but at the same time they also found numerous examples of images generated in Stable Diffusion that included a “non-trivial amount of copying” from the training images. Sometimes that was an entire image (it did a pretty good job of recreating “The Scream” from scratch) and sometimes just sections of an image.

“The goal of this study was to evaluate whether diffusion models are capable of reproducing high-fidelity content from their training data, and we find that they are. While most of the generations from large-scale models do not contain copied content, a non-trivial amount of copying does occur.” 15

It is unlikely that I could direct Stable Diffusion to produce my illustration exactly. But there is a greater than zero possibility that someone, at some point, might type in a prompt and Stable Diffusion spits out something that looks an awful lot like my illustration.

Multiple rounds of experimentation have been carried out by Reid Southen, a concept artist, to test whether he can recreate famous scenes from films and other media using these generative AI tools. It seems it is surprisingly easy to get the models to generate images that are almost 1:1 matches for frames from popular films, often without directly asking for those characters or that scene by name 16.

And while a user might easily spot when the model chucks out an image featuring Iron Man or a character from Toy Story, they would be forgiven for not noticing if it generates something that is a clear match for a more obscure piece of art vacuumed up and included in the training dataset.

Offering your customers a product that might serve them up a copyright-violating recreation of an existing image is not, in my opinion, a good outcome.

And lastly

I think it is important to mention that I was only able to piece this all together because Stable Diffusion had its origins in academia and has been released as an open source model. For which, I guess, they deserve some credit.

Not all generative AI art programmes are as open as this. While it is understood that LAION-5B makes up part of their dataset, we don’t know all of the images that were used to train Midjourney. Adobe have released some information about how they trained Firefly, but there are still a lot of questions swirling around. While the process of working out how my art was used to enable Stable Diffusion was convoluted and difficult, it is next-to-impossible to know how much of my work went into Midjourney’s creation.

This shouldn’t be how these things work. As we have seen, these models could not operate without the millions (and, in later versions, billions) of images that were used to train them.

Imagine a wind-up toy. However intricate it is, the toy is not going to move until a human hand comes along and winds it up. That hand may then be removed and the toy may totter off unaided; other hands may prod and poke at it as it meanders forward, convincing themselves that they are guiding its progress. But it is nothing without that initial input of human effort.

When it comes to these generative AI art programmes, that human effort is our art, our photos, the work of millions of humans, pumped into the system without our knowledge or consent. And without that initial input of human effort, all you have is a cleverly designed toy that isn’t going anywhere.

A long and winding road

If you have made it this far, thank you. It has been a far more convoluted journey than I expected.

I really didn’t want to have to learn about all of this; I was perfectly happy just painting pictures for people. But once I knew that my art had played a part in enabling these art automatons, I felt a responsibility to find out more.

If you are one of the millions of artists affected by these generative AI art programmes, if you have found yourself wondering how and why your art ended up inside them, then hopefully this exploration of the journey my illustration went on will help explain the process.

  1. ↩︎
  2. ↩︎
  3. ↩︎
  4. ↩︎
  5. ↩︎
  6. ↩︎
  7. ↩︎
  8. ↩︎
  9. ↩︎
  10. ↩︎
  11. ↩︎
  12. ↩︎
  13. ↩︎
  14. ↩︎
  15. ↩︎
  16. ↩︎