If you aren’t afraid of what big companies will do with AI, your threat modelling is off
The problem with regulating AI isn’t in coming up with effective regulations. That’s the easy part.
The problem is that tech and the authorities don’t want effective regulation because “effective” in this context always means “less profitable for the industry”.
That’s why they tend to either come up with ineffective half-measures or measures that strengthen the incumbents' positions.
But, what if you didn’t care about protecting Microsoft’s ability to profit from AI? What would you do then?
If you’re like me and assume that the biggest source of AI-related problems will be companies that develop and integrate AI into their products, not individual criminality and fraud, then you have to directly attack the tech industry’s ability to profit from the tech.
First, you clarify that for the purposes of Section 230 protection (or the equivalent in your jurisdiction), whoever hosts the AI as a service is responsible for its output as a publisher. If Bing Chat says something offensive, then Microsoft would be as liable as if it were an employee.
You’d set a law requiring tools that integrate generative AI to attach disclosures to the content.
- Gmail/Outlook should pop up a notice when you get an email that their AI generated.
- Word/Docs should have metadata fields and notices when you open files that have used built-in AI capabilities.
- AI chatbots have to disclose that they are bots.
- GitHub Copilot should add a machine-parsable code comment.
You could always remove the metadata, but doing so would establish an intent to deceive.
The point isn’t to create metadata that’s impossible to remove. The point is to discourage the creation of products that automate deception. This differs from cookie banners in that, for better or for worse, the modern web user comes to a site with the expectation of being tracked, but they expect emails from co-workers to have been written by those co-workers. So, the point is to make the practice socially embarrassing and prevent it from becoming the norm.
All the announced products from Google, Microsoft, and AI startups are primarily for automating deception: chatbots that don’t say they’re AI, AI-generated email and docs, pictures that are presented as photographs.
That’s their go-to-market strategy: automate and normalise deception.
They should be made to understand that this is not okay.
Finally, you’d mandate that all training data sets be made opt-in (or that all of its contents are released under a permissive license) and public.
- Heavy fines for non-disclosure.
- Heavy fines for violating opt-in.
- Even heavier fines for lying about your training data set.
- Make every AI model a “vegan” model.
- Remove every ethical and social concern about the provenance and rights regarding the training data.
—“This would nuke the entire AI industry from orbit”.
I don’t think so?
It’d pop the current bubble, absolutely, but it would also set the industry on a long-term course that’s more likely to result in genuine and sustainable breakthroughs in Machine Learning.
It would force them to design AI integrations properly, with more forethought, making them much more likely to result in genuine productivity enhancements.
Automating deception is more likely to be destructive than productive.
If you think fraud and deception is the point of AI, as it was with the crypto industry, then yes, this will kill that industry.
But, if you think there’s more to the technology, that it could lead to major improvements in the UX of modern computing, then you should embrace any measure that sets the industry on a more sustainable path than the “let’s speed-run the crypto bubble and see if we can make it even bigger” course it’s on currently.
Adobe stands out as the only prospective AI vendor who listened to their legal team
“Adobe made an AI image generator”
That this is trained only on images they explicitly have the rights to train on is moving in the right direction, as is bringing out-painting to Photoshop. But this is Adobe, which is genuinely one of the most disliked companies in existence.
And, I’d like to emphasise that it’s one of the most disliked companies in existence for very good reasons: they keep acting like shits towards a customer-base that they assume is captive, and actively try to keep captive.
(I mean, so does Microsoft, but for some reason people seem eager to forgive them. I guess designers and artists have longer memories than coders and middle-management.)
If I had to guess, this’ll be popular with enterprises who want to use generative AI to cut down on costs and reduce their workforce, because they’ll assume that they’re less likely to get sued for using this than, say, Midjourney or Stable Diffusion.
They’d probably be right about that. Even if the model has the same plagiarism rates as SD (around 1-2%), odds are you’d be copying something out of Adobe Stock and they aren’t likely to sue you for using their own tool. So, the effective plagiarism rate (i.e. copying from images outside of Adobe Stock which you then could get sued for) is likely to be considerably lower than SD’s or Midjourney’s, possibly an order of magnitude lower.
So, it’s looking like Adobe’s strategy with generative AI is to be the least sleazy company in the room (made easy by the others being extremely sleazy).
According to the FAQ they don’t train on customer data. It’s Adobe Stock, CC0-licensed (i.e. effectively public domain), and public domain.
And they want to standardise disclosure metadata that automatically gets attached to files where generative AI has been used. (Which should sound familiar to those who read what I wrote above on regulation.)
I remember writing a while back that the biggest opportunities for diffusion models was to integrate them as features into existing creative tools. Midjourney and Stability AI obviously went in the “fire your illustrators and photographers and replace them with this ad hoc lawsuit magnet with a garbage UI” direction instead. Unless Adobe fumbles this, it’s likely that they’ll slowly take over this section of the generative AI market.
(Ugh. How bad is the state of AI startups when Adobe stomps into the room with their reputation and baggage and still manages to look like the most trustworthy and ethical organisation in the field? Like, what’s wrong with the rest of you? Why did you let this happen?!)
It’s in the nature of social media discourse to diverge into extremes. So, it shouldn’t have surprised me to see so many people just decide that Adobe is outright lying when it says that it isn’t training its AI on user data.
But it did. It shouldn’t have, but it did.
In this case I’d argue that Adobe’s stance is driven by the profit motive, not altruism, which IMO make them more likely to follow through. Adobe’s customers are the creative industries and risk-sensitive enterprises. Being the only maker of a generative image model that gets easy approval from both the legal department and the designers is likely to be a license to print money.
Also, not getting sued is usually considered a plus.
If Adobe sticks to the plan they’ve announced, the question as to whether Adobe’s AI tools will be ethical is now mostly a question of how Adobe Stock is run.
But, either way, the fallout for that is unlikely to affect Adobe Firefly users, which is what you want if you’re a business.
On the Internet Archive Thing
Whatever else you think about the Internet Archive case, it does not bode well for AI models. It would seem to indicate that tech’s fair use exceptions for hoovering up data are much narrower than the industry commonly thinks.
(And I’m going to get so many angry comments for what I’m about to say…)
I’m a fan of the IA. Like the work they do. But they deliberately picked a fight with an unsympathetic adversary. To outside observers it absolutely looked like the primary purpose of the national emergency library was to get sued to try to carve out a novel legal copyright exception.
They must have known it was impossible for publishers to ignore, and I think they truly believed the law was on their side.
It backfired spectacularly. And that’s kinda on them?
And I can’t stress this enough, if you wanted to design a specific project for the Internet Archive where the objective was “guarantee that we get sued into oblivion and risk destroying everything we’ve built” what you’d end up with would look identical to the National Emergency Library.
That nobody at the IA seems to have spotted this is extremely concerning to those of us who are generally in favour of what they are trying to do.
When a jeans company needs that AI stock bump
You see a fashion co revealing their racism and ignorance. I see a multinational attempting a risk-free “AI stock bump” without compromising their processes with untested tech and instead stumbling into a gloriously stupid PR disaster that reveals them to be racist and dumb.
There’s going to be a lot of this over the next few months. Companies want the AI stock bump much like they wanted that blockchain stock bump.
But, adding new untested tech from tech cos promising the world doesn’t have a great track record. See Watson. Or every blockchain project for that matter. Genuine attempts to integrate these things too early are usually extremely expensive disasters.
So, superficial stuff is going to be the order of the day for as long as the stock market rewards it.
Software Dev and Tech Links
- “Content negotiation considered harmful - snarfed.org”. I used to be a big fan of HTTP content negotiation (where one URL can return a variety of formats depending on what the client supports), but, yeah, it usually introduces more problems than it fixes.
- “The Online Photographer: How Is TOP Doing? (Blog Note)". Looks like all of the old photography mainstays, or what remains of them, aren’t doing so well.
- “The problem with Don Norman’s new book”. I used to be a big fan of Don Norman, but we have better resources now, better writers, more inclusive thinking on usability. We don’t need a new Don Norman book.
- “Web fingerprinting is worse than I thought - Bitestring’s Blog”
- “You’re Doing It Wrong: Notes on Criticism and Technology Hype”. On criti-hype and how critics often paradoxically echo the hyperbolic promises of the tech industry.
- “The venture capitalist’s dilemma: The embarrassing investor meltdown surrounding Silicon Valley Bank should drive us to consider new models."
- “Disruption Killed Innovation”
- “AI Takes Over Because of Human Hype, Not Machine Intelligence - Jim Nielsen’s Blog”
- “Here are more details on Ubisoft’s Ghostwriter AI tool from GDC 2023”
- “OpenAI’s policies hinder reproducible research on language models”
- “The Uncanny Valley”. “People are justifiably frustrated at the fact that the world’s most powerful people are simultaneously incompetent and invincible, ever protected from the consequences of running a risky, rotten economy.”
- “ChatGPT-4 produces more misinformation than predecessor - NewsGuard”. This shouldn’t come as a surprise. In the research I’ve read, hallucinations are an emergent property that increases with the size of the model
- “Adactio: Journal—Disclosure”. Not disclosing that something is AI-generated is so obviously unethical that I expect the tech industry to fight any and every attempt to mandate disclosure tooth and nail.
- Tech cos keep laying off entire responsible/safe AI teams in one go.
- “Peerless Whisper – Eric’s Archived Thoughts”. Transcription and captioning is exactly the sort of thing we should be using these systems for.
- “The machines won’t save your design system”. “We have too much confidence in our ability to make technology better, and not enough respect for its ability to make our lives worse.”
- “Some Thoughts on Five Pending AI Litigations - Avoiding Squirrels and Other AI Distractions - The Scholarly Kitchen” . Good overview of the pending cases. And, as it points out, EU-based lawsuits are likely to follow.
- “Privacy Violations Shutdown OpenAI ChatGPT and Beg Investigation”. Almost as if centralising an entire industry on the services provided by a couple of companies was a bad idea. (A bad idea that tech loves: see AWS)
- “Don’t trust AI to talk accurately about itself: Bard wasn’t trained on Gmail”
- “Chatbots, deepfakes, and voice clones: AI deception for sale”. From the US FTC. Also looks like pre-existing regulations might apply quite well to generative AI.
- “Great, Dating Apps Are Getting More Hellish Thanks to AI Chatbots”. Tech is digging up its “all regulations lead to monopolies” narrative. Meanwhile, something as simple as requiring that all AI-driven communications be disclosed would prevent a wide range of abuses.
- “Poisoning Web-Scale Training Datasets is Practical”. A worrying aspect of the wholesale adoption of LLMs in all of our productivity and coding tools is that it seems feasible to poison their training data, both broadly and targeted, at a fairly low cost.