As people have been sharing old posts of mine where I dug into some detail on generative models, some of them seem to be left with the impression that my concern with generative models is largely limited to the AGI science-fiction myth and not with the generative models per se.
This is pretty far from the truth, which prompted me to write the following on Mastodon.
Post-hoc explanations based on personal interactions with processes that are substantially random will generally be incorrect. The observed patterns will be random, not systemic.
IOW, everything written about LLMs from the perspective of a single practitioner can be dismissed out of hand. The nature of LLMs makes it impossible to distinguish signal from noise in your own practice.
I suggest not even reading these posts. Our brains are unfortunately wired to mistake confident writing for evidence.
The only studies you should take absolutely seriously are larger studies done with a meaningfully large sample size, where the study’s design takes care to account for randomness, construct validity (where applicable), and context.
I.e. not studies that claim 200% productivity improvements without noting that the tasks are entirely synthetic (they are, effectively, just testing that text extruders extrude text).
The problem is that LLMs generally don’t function well in larger impartial studies.
Impartiality is vital. Any study that is directly performed or sponsored by a vendor is not to be trusted. Don’t even read it. It doesn’t matter how well it’s constructed, there are too many ways for researcher bias to affect the results of even the most well-structured study. Vendor-affiliated studies are marketing. Nothing more.
Even studies that are independent but are based on confidential data from vendors should be dismissed out of hand. Science needs to be open.
Almost everything positive about LLMs you see today is based on extremely unreliable evidence, much of it from some of the most dishonest people you’ll find in tech (cryptocoiners, effective altruists, etc.).
Given that these models are obscenely power-hungry during a global climate crisis it’s my opinion that integrating these tools into your daily work is inherently unethical and irresponsible.
That’s without getting into the ethics of how the training data is handled or how these systems are largely being used to replace labour. I know where I stand on those topics, but you don’t need to have an opinion on them to understand that very few people should be using these systems. Just look out of the window and see that we’re in a climate crisis. That should be enough for you to step away from LLMs.
And, no. I’m not going to debate you on this. Any of you. These are complex large scale statistical models that defy individual observation. We do not further our understanding of them through personal opinions or social media arguments but through science.
Want to make a point? Get a study published in a peer-reviewed journal.
Besides, these are unreliable tools made by organisations we absolutely should not trust, coming out of a field (AI) with a long history of over-promising, snake oil, and even outright fraud.
The burden of proof is on them, not us.
—But, Baldur, why didn’t you make this argument in your book?
If “don’t do this shit, it’ll make the global climate crisis worse” were a convincing argument in a business context, we wouldn’t have a global climate crisis.
ETA: Regarding this point:
IOW, everything written about LLMs from the perspective of a single practitioner can be dismissed out of hand. The nature of LLMs makes it impossible to distinguish signal from noise in your own practice.
The analogy here is homeopathy or acupuncture. You can’t trust your own experience to be an accurate gauge of the therapeutic benefit of these methods. Our wealth of psychological biases combined with random patterns means that individual experiences are not to be trusted.
LLMs trigger just as many psychological biases:
- Automation bias.
- Confirmation bias.
- Eliza effect.
- Automated cold reading.
- And more.
We need well-constructed larger scale studies to even begin to accurately estimate the effectiveness of these systems. If you rely on your own experiences, you will end up becoming the tech equivalent of a naturopath.