On visual AI and stealing

There’s plenty of misconception on the Internet regarding AI that composes images – how it works, whether it steals existing art, and will it replace human artists. One group most affected by the misconceptions seems to be image creating artists. It’s natural, human behaviour to react with hostility to things that we don’t understand and we think can harm us (in this case make us redundant). It’s perfectly natural and worth acknowledging. What’s not worth acknowledging is the lack of effort to understand the problem. Since you’re on this page, naturally you’ve decided to make the effort. Enjoy, then, as this article aims to clean up a few common misconceptions.

How an AI learns

Basically, all the inferring, thinking and having reflections in AI is done as multiplying big matrices. Current neuroscience seems to support this theory. Imagine – all your hopes, dreams and thoughts – basically boil down to voltages and impulses of your brain, or – in case of computers – digits in neatly packed matrices (a table with numbers). You can disagree – but it won’t change the reality. In mathematics you can define such operations as adding two tables to each other, or multiplying them. You can then construct more advanced operations, such as power function or derivation or integration.

Regarding how an AI learns – this is easy to explain if you’re well-versed in mathematical analysis, but I’ll try to explain it nevertheless. Let’s assume that we’re now creating a neural network that will tell us, given the picture of a flower, what kind of flower that is. Imagine that you have a certain input (eg. picture of a flower), and required output for it – eg. the kind of flower that is. You assemble many, many pairs of it, until you gather enough to constitute your training set. You need also to store what the network learns. You can safely store that inside the matrices that will constitute so-called parameters. Parameters is a correct name for the current state of knowledge within the network. You can now write that mathematically, that the output (or what the network tells the flower to be) depends on the input and the parameters. Of course you can’t start from zero. You will set the initial value of parameters to some random things, and update them to represent what the network learns. There are a few different ways to come up with the initial, random values, and no one seems to work best, so it will ultimately boil down to your attitude and mood.

Naturally, at first, the network doesn’t have a clue about flowers. So given it’s first images (or a bunch of images, you can average that as well) it will display complete bullshit, ie. it’s real output will be different than desired output. Fortunately mathematics has a way to tell what should be changed to display the correct output, it’s called a derivative of error function. Since a derivative shows what parameters need to be increased in order for the error function to be even larger, so by subtracting the value of the derivative from the parameters we can make that error function as small as we need it to. The network is learning, and the only thing that we needed to change were the parameters. Of course by increasing the number of parameters we can allow the network to learn more complex problems (that has some other problems that I’ll talk further on). On the other hand, networks having too little parameters can be unable to sufficiently generalize the problem, ie. they will suck.

The problem with that

There’s one problem with learning that way. The network can take one of two ways to solve our problem:

  • try to understand what kinds of flowers have what properties – ie. understand the problem. This is the optimal case.
  • learn all of the examples by heart – this is called overfitting and is as good of an idea as it sounds. The network will correctly respond on the input it has seen up to this point and return bogus data for everything else

Naturally, having more parameters than necessary outfits the network with sufficient brainpower to memorize everything, but we do our best to keep that from happening, as such AI models are simply bogus and useless. There is a bunch of ways to prevent the network from learning it’s examples by heart, we call it regularization. Before deep learning it was done by artificially constraining the parameters with a formula, but nowadays it’s done by dropout, ie. changing only some parameters during some training loops. As training the network is an iterational process, ie. we repeatedly show the network our inputs, modify it’s parameters, and repeat the process until the model converges, or reaches desired quality. Naturally, quality is then determined using a subset of training set, not shown to the network beforehand (so it doesn’t have a chance to learn it by heart). This is called the test set or validation set.

Besides, a network is normally not shown only a single image during a loop of this learning experience. It’s instead shown an arithmetic average of a number of images – this approach is called minibatch training.

However, the interpretation of parameters – namely what has the network learned – right now evades our grasp. There is significant research put into this area, but so far it’s very hard to determine exactly what does the network learn. It makes as much of sense as taking a part of human’s brain under the microscope and determining what’s stored there. All we know that the network learned something. The problem is also the network can’t tell you why it has reached a particular conclusion, or output the data it’s giving you, other than “because I say so”.

So, an image-building AI learns a description of an image, but the description is first dumped to digits and numbers. Then it’s mixed with a few other images, and their descriptions. Of course, mixing descriptions doesn’t make sense, but of course it does when they are digits. This networks can be used to either generate images, or – by simply replacing inputs with outputs – to generate a description for an image. I’ll let the reader answer the question – does that constitute stealing any more than what humans do when they visit an art gallery? If AI learned an image by heart, and then actively used these very pixels to construct it’s output that might be answered positively, but as I explained before we’re doing our best to prevent the AI from doing that. An interesting field of research would be to analyze what networks generate with single-word prompts, and try to describe somehow what does the network perceive as the visual definition of that word. At the end of the day, everything boils down to a set of numbers and multiplying big matrices. This apparently is what intelligence is. Your brain also multiplies big matrices – there’s overwhelming evidence for that – but does it so fast, that computers that can calculate that fast are to be built in 8 years from now on.

Ever since Hubel and Weisel did their experiments with the cats (and determined what the first layer of visual processing in the brain does) we tend to model the way our “artificial brains” work on the one intelligent structure in the Universe, that we know exists and is intelligent for sure. Namely, these are our own brains. If our artifacts worked very differently, that would mean that we cracked the code of intelligence. Until we do, the only hope of us creating something that one day will be as smart as us, is parroting from the way humans are built and operate.

Questions worth asking

So now it isn’t about who has the best algorithms – only who has the best data. If copyrighted works are used to construct AIs, who should have the copyright on it’s work? If a human is inspired by some art piece, and creates his own – or replace the human with an AI – who should have the copyright on the resulting work? The assumption that since humans are intelligent, only they can take reference, and any other form of creative work on existing work constitutes stealing, is characteristic of, in my opinion, a narrow mind. So that’s the true problem – when taking reference becomes stealing. It should not depend on what is stealing – is it a human or an AI? As I argued previously, all displays of intelligence are an effect of multiplying big matrices. Technology won’t replace people, but people who know how to use technology are sure to replace people who don’t know how to utilize it.

So, some of the better questions to ponder:

  • how should the copyright system be structured, or changed, to reflect content made by non-humans?
  • when does taking reference stop, and stealing starts?
  • are only humans capable of taking reference and not stealing? What quality grants them this right?
  • how do we reward human artists for creating content that AI later uses as input into learning? Do we reward them for “giving reference” to other human artists?
  • how can we use AI to simplify, or enhance our current work?
  • what happens when AI starts to learn from art made exclusively by other AIs?

Last, closing argument. Do you happen to know how much information, in gigabytes or terabytes, you experienced before reaching the age of 18? I bet it’s an order of magnitude larger than current size of the Internet. And you had to experience all of this, ie. process this data, to become the person that you are know. AI won’t be exempt from that rule. All the things that you know come from your experience, this can be put in digital. So I suppose that true AI intelligence will naturally emerge from putting enough CPU power and data in one place. Therefore, it doesn’t matter who has the best algorithms, since even stupid humans are brilliantly intelligent. Ultimately quality of the data matters little, as one man even put it quantity has a quality all it’s own. It matters, who has the largest amount of data.

Published

By Piotr Maślanka

Programmer, certified first aider, entrepreneur, biotechnologist, expert witness, mentor, former PhD student. Your favourite renaissance man.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.