Regulating AI (plus links & notes)

27 March 2023 – Baldur Bjarnason

How to regulate AI #

The problem with regulating AI isn’t in coming up with effective regulations. That’s the easy part.

The problem is that tech and the authorities don’t want effective regulation because “effective” in this context always means “less profitable for the industry”.

Adobe stands out as the only prospective AI vendor who listened to their legal team #

“Adobe made an AI image generator”

That this is trained only on images they explicitly have the rights to train on is moving in the right direction, as is bringing out-painting to Photoshop. But this is Adobe, which is genuinely one of the most disliked companies in existence.

And, I’d like to emphasise that it’s one of the most disliked companies in existence for very good reasons: they keep acting like shits towards a customer-base that they assume is captive, and actively try to keep captive.

(I mean, so does Microsoft, but for some reason people seem eager to forgive them. I guess designers and artists have longer memories than coders and middle-management.)

If I had to guess, this’ll be popular with enterprises who want to use generative AI to cut down on costs and reduce their workforce, because they’ll assume that they’re less likely to get sued for using this than, say, Midjourney or Stable Diffusion.

They’d probably be right about that. Even if the model has the same plagiarism rates as SD (around 1-2%), odds are you’d be copying something out of Adobe Stock and they aren’t likely to sue you for using their own tool. So, the effective plagiarism rate (i.e. copying from images outside of Adobe Stock which you then could get sued for) is likely to be considerably lower than SD’s or Midjourney’s, possibly an order of magnitude lower.

So, it’s looking like Adobe’s strategy with generative AI is to be the least sleazy company in the room (made easy by the others being extremely sleazy).

According to the FAQ they don’t train on customer data. It’s Adobe Stock, CC0-licensed (i.e. effectively public domain), and public domain.

And they want to standardise disclosure metadata that automatically gets attached to files where generative AI has been used. (Which should sound familiar to those who read what I wrote above on regulation.)

I remember writing a while back that the biggest opportunities for diffusion models was to integrate them as features into existing creative tools. Midjourney and Stability AI obviously went in the “fire your illustrators and photographers and replace them with this ad hoc lawsuit magnet with a garbage UI” direction instead. Unless Adobe fumbles this, it’s likely that they’ll slowly take over this section of the generative AI market.

(Ugh. How bad is the state of AI startups when Adobe stomps into the room with their reputation and baggage and still manages to look like the most trustworthy and ethical organisation in the field? Like, what’s wrong with the rest of you? Why did you let this happen?!)

It’s in the nature of social media discourse to diverge into extremes. So, it shouldn’t have surprised me to see so many people just decide that Adobe is outright lying when it says that it isn’t training its AI on user data.

But it did. It shouldn’t have, but it did.

In this case I’d argue that Adobe’s stance is driven by the profit motive, not altruism, which IMO make them more likely to follow through. Adobe’s customers are the creative industries and risk-sensitive enterprises. Being the only maker of a generative image model that gets easy approval from both the legal department and the designers is likely to be a license to print money.

Also, not getting sued is usually considered a plus.

If Adobe sticks to the plan they’ve announced, the question as to whether Adobe’s AI tools will be ethical is now mostly a question of how Adobe Stock is run.

But, either way, the fallout for that is unlikely to affect Adobe Firefly users, which is what you want if you’re a business.

On the Internet Archive Thing #

Whatever else you think about the Internet Archive case, it does not bode well for AI models. It would seem to indicate that tech’s fair use exceptions for hoovering up data are much narrower than the industry commonly thinks.

(And I’m going to get so many angry comments for what I’m about to say…)

I’m a fan of the IA. Like the work they do. But they deliberately picked a fight with an unsympathetic adversary. To outside observers it absolutely looked like the primary purpose of the national emergency library was to get sued to try to carve out a novel legal copyright exception.

They must have known it was impossible for publishers to ignore, and I think they truly believed the law was on their side.

It backfired spectacularly. And that’s kinda on them?

And I can’t stress this enough, if you wanted to design a specific project for the Internet Archive where the objective was “guarantee that we get sued into oblivion and risk destroying everything we’ve built” what you’d end up with would look identical to the National Emergency Library.

That nobody at the IA seems to have spotted this is extremely concerning to those of us who are generally in favour of what they are trying to do.

When a jeans company needs that AI stock bump #

You see a fashion co revealing their racism and ignorance. I see a multinational attempting a risk-free “AI stock bump” without compromising their processes with untested tech and instead stumbling into a gloriously stupid PR disaster that reveals them to be racist and dumb.

There’s going to be a lot of this over the next few months. Companies want the AI stock bump much like they wanted that blockchain stock bump.

But, adding new untested tech from tech cos promising the world doesn’t have a great track record. See Watson. Or every blockchain project for that matter. Genuine attempts to integrate these things too early are usually extremely expensive disasters.

So, superficial stuff is going to be the order of the day for as long as the stock market rewards it.

Software Dev and Tech Links #

“Content negotiation considered harmful - snarfed.org”. I used to be a big fan of HTTP content negotiation (where one URL can return a variety of formats depending on what the client supports), but, yeah, it usually introduces more problems than it fixes.
“The Online Photographer: How Is TOP Doing? (Blog Note)”. Looks like all of the old photography mainstays, or what remains of them, aren’t doing so well.
“The problem with Don Norman’s new book”. I used to be a big fan of Don Norman, but we have better resources now, better writers, more inclusive thinking on usability. We don’t need a new Don Norman book.
“Web fingerprinting is worse than I thought - Bitestring’s Blog”
“You’re Doing It Wrong: Notes on Criticism and Technology Hype”. On criti-hype and how critics often paradoxically echo the hyperbolic promises of the tech industry.

AI links #

“The venture capitalist’s dilemma: The embarrassing investor meltdown surrounding Silicon Valley Bank should drive us to consider new models.”
“Disruption Killed Innovation”
“AI Takes Over Because of Human Hype, Not Machine Intelligence - Jim Nielsen’s Blog”
“Here are more details on Ubisoft’s Ghostwriter AI tool from GDC 2023”
“OpenAI’s policies hinder reproducible research on language models”
“The Uncanny Valley”. “People are justifiably frustrated at the fact that the world’s most powerful people are simultaneously incompetent and invincible, ever protected from the consequences of running a risky, rotten economy.”
“ChatGPT-4 produces more misinformation than predecessor - NewsGuard”. This shouldn’t come as a surprise. In the research I’ve read, hallucinations are an emergent property that increases with the size of the model
“Adactio: Journal—Disclosure”. Not disclosing that something is AI-generated is so obviously unethical that I expect the tech industry to fight any and every attempt to mandate disclosure tooth and nail.
Tech cos keep laying off entire responsible/safe AI teams in one go.
“Peerless Whisper – Eric’s Archived Thoughts”. Transcription and captioning is exactly the sort of thing we should be using these systems for.
“The machines won’t save your design system”. “We have too much confidence in our ability to make technology better, and not enough respect for its ability to make our lives worse.”
“Some Thoughts on Five Pending AI Litigations - Avoiding Squirrels and Other AI Distractions - The Scholarly Kitchen” . Good overview of the pending cases. And, as it points out, EU-based lawsuits are likely to follow.
“Privacy Violations Shutdown OpenAI ChatGPT and Beg Investigation”. Almost as if centralising an entire industry on the services provided by a couple of companies was a bad idea. (A bad idea that tech loves: see AWS)
“Don’t trust AI to talk accurately about itself: Bard wasn’t trained on Gmail”
“Chatbots, deepfakes, and voice clones: AI deception for sale”. From the US FTC. Also looks like pre-existing regulations might apply quite well to generative AI.
“Great, Dating Apps Are Getting More Hellish Thanks to AI Chatbots”. Tech is digging up its “all regulations lead to monopolies” narrative. Meanwhile, something as simple as requiring that all AI-driven communications be disclosed would prevent a wide range of abuses.
“Poisoning Web-Scale Training Datasets is Practical”. A worrying aspect of the wholesale adoption of LLMs in all of our productivity and coding tools is that it seems feasible to poison their training data, both broadly and targeted, at a fairly low cost.

Previous entry

How to regulate AI

27 March 2023
Next entry

GDPR and American AIs

3 April 2023