I’m not going to comment on That Banking Thing other than to say that this entire saga just showed us, again, that Miles Bron in The Glass Onion was an accurate representation of a certain type of tech/startup/VC guy.
This is exactly the sort of mess and resolution of said mess that is caused by that kind of ignorance, coupled with the power we keep giving to those ignoramuses.
Anyway, I digress…
Keeping up with and assessing AI research
If you find it hard to keep up with AI research, here’s a foolproof rule-of-thumb for deciding which studies to ignore:
If a majority of the authors work for the AI vendor or a major partner, you know it:
- Will generally show positive effects
- Not too positive, for credibility
- Also has at least one solemn acknowledgement of a potential problem before dismissing it
- Will be impossible to replicate as some core part of it, either code or data will not be public.
So, you can ignore it.
This research is what I call advocacy research. It’s the kind of study you’d get if you went to an ad agency and told them you wanted to promote your product but, “y’know, make it sound scientific”.
The study is done by people familiar with the tech, many of whom, consciously or unconsciously, will avoid the system’s glaring pitfalls. Much like when you habitually use a system whose undo-redo stack breaks when you paste something. After a few months you automatically work around the issue without thinking about it. In these cases, they’ve been working with developing versions of these systems for years.
They’re clued in enough to know what flaws are well-known and need to be acknowledged for credibility. They probably don’t think of it that way, more that it’s the first that comes to mind, but that’s the effect it has on the study’s structure.
And, finally, if the study’s results look too bad, odds are it will never see the light of day. Just look at what happened to Timnit Gebru. Google’s management felt that supressing the paper was a reasonable request, because from their perspective, it was a reasonable, par-for-the-course, request.
Studies like this are also hit by the counter-intuitive fact that, when it comes to productivity, it’s harder to prove a positive than a negative. (Completely opposite to what you usually see in science.) In workplace studies and organisational and work psychology, it’s notoriously hard to prove that a specific measure genuinely improves productivity or not.
(Doesn’t stop every company ever from claiming that it’s able to do that.)
Individual studies on a specific productivity intervention commonly suffer from:
- Demand characteristics, where the subjects alter their behaviour to fit what they think the researcher wants.
- The novelty effect, where performance tends to improve just because it’s new and shiny and not because of anything inherent in the intervention.
- Advocacy research done by people involved in the making of a system or intervention suffers especially from the observer-expectancy effect. The experimenter’s subconscious biases and familiarity with the system affects the outcome.
- And, as Terence Eden pointed out, AI in particular is vulnerable to the Barnum/Forer effect.
All of this combined mean that any study that claims to prove that a specific software intervention has a positive effect needs to be received with much more scepticism than a study showing a detrimental effect.
Because, as it happens, detrimental effects tend to be easier to demonstrate conclusively. Sometimes it’s because they are statistical patterns—memorisation/plagiarism rates, hallucination rates, that sort of thing. Sometimes it’s because the outcome is just obviously worse. And often it’s because adding a roadblock to a process is so obviously a roadblock.
All of which is to say that you should pay more attention when a study shows that AI code assistants increase the number of security issues. And you should pay less attention to studies that claim to prove that AI code assistants increase productivity.
Especially when you consider that we can’t even agree on what productivity is in the context of software development. What you measure changes completely depending on the time horizon. Software development productivity over a day, week, month, or a year are very different things. What makes you more productive over a day can make you less productive over a year by increasing defects and decreasing familiarity with the code base.
So, measuring the productivity increase of a single intervention is extremely hard and the people who do these studies are generally trying to sell you something.
You can measure the overall productivity of a system. But, when you experiment with productivity through a measure-adjust-measure cycle, you end up tampering with the process which commonly magnifies error and variability. (There’s a ton of literature on this. Search for “Deming tampering” or “Deming Funnel Experiment” for examples.)
Reading advocacy research like this is useful for discovering best-case disasters. Like in the Github Codex paper they say that memorization (where the model outputs direct copies of code from the training data) happens in about 1% of cases, as if that were not a worryingly high percentage.
(This is even on the Github Copilot sales page, under “Does GitHub Copilot copy code from the training set?”. Where it says: “Our latest internal research shows that about 1% of the time, a suggestion may contain some code snippets longer than ~150 characters that matches the training set." So, the answer to that is yes.)
From that you can fairly safely assume that it happens at least 1% of the time because tech vendors always underestimate disaster.
And, y’know, 1% chance of plagiarism every time you use it should have been a catastrophic scandal for a code tool. But, apparently we’re all fine with it? Because, mixing random GPL-licensed code in with proprietary code is never an issue, I guess?
A 4-Monthiversary sale for Out of the Software Crisis
It’s almost 4 months since I published my book “Out of the Software Crisis”.
I’ve been really happy with the response and wanted to do something cool to mark the occasion.
But I couldn’t come up with any cool ideas, so I decided to just do a sale instead: 20% off!
Sale expires 16:00 GMT, 15 March 2023. Sale is now over!
Until then, you’re benefiting from my inability to come up with something clever.
Out of the Software Crisis: Systems-Thinking for Software Projects
(20% off) (Sale is now over.)
How do you know if you’re going to like it?
Well, the best way is to read bits of it! Which you can do because I’ve posted extracts as essays on my website.
If you enjoy reading these, you’re going to enjoy the book:
- Great apps are rare. The book’s introduction
- Theory-building and why employee churn is lethal to software companies. Why layoffs are usually bad ideas.
- WTF is a Framework? A Kuhnian take on frameworks.
- Programming is a Pop Culture. Always relevant.
These few aren’t extracts from the book but are very much in the same vein and style.
- Small is Successful.
- Tech Companies Are Irrational Pop Cultures. I have a theme going here, if you can’t tell.
- 10x Devs Don’t Exist, But 10x Organisations Do.
- Generative AI is the tech industry’s Hail Mary pass.
Reading through these should give you a very clear idea of whether you’ll like the book or not.
Don’t like the essays? Avoid the book.
Enjoy the essays? Then the book has more.
the book is on sale for the next few days, until 16:00 GMT on the 15th of March 2023. Until then, 20% off. (Sale is now over.)
It Took Me Nearly 40 Years To Stop Resenting Ke Huy Quan - Decider
This is so touching.
Craft vs Industry: Separating Concerns - hello, yes. I’m Thomas Michael Semmler: CSS Developer, Designer & Developer from Vienna, Austria
Seems to be asking exactly the sort of questions we should be asking ourselves at this time.
Lovely to see AI critics split into adversarial factions over Liliputian which-end-of-the-egg details when the hype crowd stands united around bullshit and false promises.
Vanderbilt Apologizes for ChatGPT-Generated Email
Like I’ve said before, the technical term for somebody who uses AI to write their emails is “asshole”.
Startup Winter is Coming - Stacking the Bricks
A few years' old but very accurate.
The long shadow of GPT - by Gary Marcus
I’ve tried hard in my research to be even-handed—try to get a realistic sense of the pros and cons of this tech
But, man, does it seem tailor-made for fraud and misinformation.
These models absolutely do have practical uses. They are amazing at many of the things they do, which is something that recent bubbles like cryptocoins didn’t have.
But I have never seen a technology so perfectly suited to fraud, disinformation, and outright abuse. Yeah, even more than crypto. These models are much more accessible, much easier to use, and have a much wider range of different abuse and fraud vectors.
I’m finding it hard at this point to see how this can be a net win.
Like, a 13-year-old dickweed isn’t likely to be able to engage in a rug-and-pull and is more likely to be the victim, not perp, in a crypto scam.
But using his gaming computer to download a bunch of photos off a classmate’s social media account and using stable diffusion+dreambooth to create deepfakes of her is so easy that I’m convinced it’s already begun to happen.
And that’s just one potential abuse vector for one of these models. They all have dozens, if not hundreds of abuse vectors.
At the moment, I think the only thing that’s truly holding back large-scale abuses and fraud is the fact that these tools are slow and expensive.
If prices continue to drop (which IMO is necessary for financial viability) then we’re in for a world of hurt.
Abuse and deception tactics with AI are becoming quite sophisticated. Already last autumn, researchers started to notice organised efforts to use AI image generation for astroturfing.
Now, with improved image generation and lower OpenAI API prices, it seems very likely that these astroturfing systems will very quickly become more sophisticated and more realistic. Autogenerated social media profiles. Realistic family photos. LinkedIn profiles more convincing than your own.
Kodsnack discussion with Tim Urban and Torill Kornfeldt
A really interesting discussion between an AI enthusiast and a biologist. Poor audio but worthwhile.
The Best of the Rest
- Influencer Parents and Their Children Are Rethinking Growing Up On Social Media - Teen Vogue
- The Exploited Labor Behind Artificial Intelligence
- Artists can now opt out of generative AI. It’s not enough.
- Can ChatGPT—and its successors—go from cool to tool? - Freedom to Tinker. Where somebody tries to use ChatGPT to do a thing it’s absolutely not capable of.
- ‘Horribly Unethical’: Startup Experimented on Suicidal Teens on Social Media With Chatbot. We’re going to see more of this kind of thing. Hype leads people to cut corners and the hype is just beginning.
- Disability, Bias, and AI :: Aaron Gustafson
- Employees Are Feeding Sensitive Business Data to ChatGPT. I’m guessing that the reason why OpenAI changed their data retention policy the other day is that they knew these stories would be popping up.
- Large Language Models and software as a medical device - MedRegs
- More than you’ve asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models - Papers With Code
- I tested how well ChatGPT can pull data out of messy PDFs. Bullshitting, misgendering, typo injection, and a 1-6% error rate. Using these tools to extract structured data from unstructured seems like a huge mistake.
- Briefly, my concern with ChatGPT - by Daniel Zollinger
- Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models. This is interesting. Seems to indicate that the plagiarism rate is around or in excess of 2%.