Copyright, Situating Search, and other links & notes

27 February 2023 – Baldur Bjarnason

An overview of the AI copyright situation, again #

Prompted by news that the US Copyright Office doesn’t consider AI-generated art to have copyright protection, I failed to resist the urge to be a reply guy on social media and, well, replied with explanations. The following passage is a light rewrite of some of my comments on Mastodon.

Based on the research I’ve been doing, it’s pretty clear cut that images an AI generates from a prompt won’t have any copyright protection in the US or the EU. This isn’t just a position taken by the US Copyright Office.

The rules for derived works are less clear and will probably need to be assessed individually. Like scribble diffusion would likely have to be assessed based on the originality of the sketch.

The EU rules are even less clear than the US ones because they vary across the region. But the consensus seems to be that prompt-generated works are very unlikely to be protected but that derived works might be, depending on the process. And the issue in the EU is that in some jurisdictions technically the software author and even the owners of the training data might share ownership of the work.

The same issues apply to generated text, by the way.

The hurdle for prompts is the originality bar, which is also an issue for copyrighting jokes and tweets: the shorter the work, the higher the bar for originality. Otherwise, you’d end up with the entire English language covered by somebody’s copyright somewhere

So, prompts seem to be too short to qualify as an original work, which means that any generated art derived from it won’t be an original work either

At least, that’s the reasoning in the papers on the subject I’ve been reading.

Here’s the salient bit from a paper based on a study commissioned by the European Commission on AI and copyright:

In such cases, however, except for the user-generated prompt it will be difficult to identify any creative choice by the human user in the conception, execution or redaction phases. Consequently, any AI-assisted output generated by such systems would probably not qualify as a “work”.

That paper/article is a reasonably clear overview of the state of play for AI works and copyright protection in the EU. It also highlights the issue where some territories, like Ireland and the UK, have a lower bar for copyrightability and how that could lead to situations where an AI-generated work is protected in the UK and Ireland, but not in the US and the EU.

Because of this lack of protection, “prove this is/isn’t generated by AI in a court of law” is definitely going to become a thing.

Sometimes it might be simple. An original, full res file is generally going to prove that it wasn’t AI generated. (AI generated files are low-res, then upscaled, and will be for quite a few years to come.) Or show a PSD/illustrator file with layers and objects.

It gets trickier with text, though. We’re helped there by the fact that tools like ChatGPT are quite bad at longer texts.

But, if all else fails, the AI industry being so centralised might simplify things: just subpoena OpenAI’s logs?

On “Situating Search” #

The “Situating Search” paper by Emily M. Bender and Chirag Shah is a really excellent analysis of why attempts to marry Large-Language-Models with search are probably misguided.

On Google’s proposal for integrating LLMs and search:

Technical flaws. First, it is flawed technically in that it is based on misconceptions of the technical capabilities of language models. For example, point (6) is best understood not as an ‘unsolved problem’ but rather a category error. Nothing in the design of language models (whose training task is to predict words given context) is actually designed to handle arithmetic, temporal reasoning, etc.

Also great point:

But they provide no citations nor other evidence that anyone has been asking for such a system. We thus read this ‘promise’ as not coming in response to a demand from users, but rather as a technologist’s dream.

I could honestly quote something from every page of this paper. Go read it.

Then go read Marcia J. Bates’ classic 1989 “berrypicking” paper for bonus points.

The Intelligence Illusion

What are the major business risks to avoid with generative AI? How do you avoid having it blow up in your face? Is that even possible?

The Intelligence Illusion is an exhaustively researched guide to the business risks of language and diffusion models.

Get the ebook in PDF and EPUB for $35

Web Dev Links #

“An update on Robust Client-Side JavaScript - molily”
“How Shadow DOM and accessibility are in conflict”
“Chrome’s Headless mode gets an upgrade: introducing --headless=new - Chrome Developers”. This did not fix Chrome’s longstanding hyphenation bug for headless Chrome on non-Mac platforms.

The rest of the links on the ongoing AI nonsense #

“GPT is an unreliable information store - by Noble Ackerson” #

Ok so obviously it got one fact half right, and then made up pretty much every other fact is upsetting. I’m pretty sure I’m still alive.

‘This AI chatbot “Sidney” is misbehaving - Nov 23 2022 Microsoft community thread’ #

It exhibits all of the same misbehaviour that came to light in the past few weeks

“A Concerning Trend – Neil Clarke” #

I’ve reached out to several editors and the situation I’m experiencing is by no means unique

AI tools are going to make the spam problem exponentially worse.

A Sci-Fi/Fantasy short story publisher had to close submissions because they were being overwhelmed by people submitting AI-generated stories.

“How much of AI’s recent success is due to the Forer Effect? – Terence Eden’s Blog” #

I hadn’t made this connection, but it seems obvious in hindsight. A lot of the ‘magic’ in generative AI is in the eye of the beholder. See also: “People keep anthropomorphizing AI. Here’s why”

“Place your bets - Charlie’s Diary” #

This is what I’ve been saying all along 🙂

Specifically, this what I was talking about in my Generative AI is the tech industry’s Hail Mary passletter from a few weeks ago.

“The AI Crowd Is Mad” #

I don’t want to say that these qualities can’t or won’t be achieved, I’m trying to argue that investors are already pricing in these non-existent features.

The scene is priced as if all of its problems are already solved. This might be a brilliant gamble. Or it might be utter foolishness.

‘Don’t believe ChatGPT - we do NOT offer a “phone lookup” service’ #

I don’t think ChatGPT users have got the message that it doesn’t actually deal in facts or factual research.

I can’t imagine being 18, wondering if there will be any way to get a job in the arts when every entry level art job will be taken up by AI by the time they’re halfway through art school.

I’ve been a fan of Rosalarian’s work for years and I think they’ve hit the nail on the head here.

This week’s obsessive listens #

I’ve been doing a bit of catching up on some of the Icelandic music that was released while I was living abroad.

This one in particular just completely stuck in my brain, for some unfathomable reason:

Previous entry

Deno, Shakespeare's Emoticon, Return to Office, and other links and notes

20 February 2023
Next entry

When you promise an AI revolution, eventually you will have to deliver

6 March 2023