Web dev at the end of the world, from Hveragerði, Iceland

Beware of AI pseudoscience and snake oil


This essay is an excerpt from my book, The Intelligence Illusion: a practical guide to the business risks of Generative AI, with alterations to make it more readable on the web and minor additions to make it work as a standalone document.

If you want to read more from the book, I also published Artificial General Intelligence and the bird brains of Silicon Valley, which is an essay from the book on the dangers of believing in the AGI myth.

AI research is poor. Many of their claims will be proven wrong. AI software vendors have financial incentive to exaggerate the capabilities of their tools and make them hard to disprove. This undermines attempts at scientific rigour. Many of AI’s promises are snake oil.

It’s important to be sceptical about the claims made by AI vendors

The AI industry is prone to hyperbolic announcements.

  • Watson was supposed to transform healthcare and education but ended up being a costly disaster.1
  • Amazon was planning on using AI to revolutionise recruitment before they realised they’d automated discrimination and had to scrap the project.2
  • AI was supposed to be a revolutionary new tool to fight the COVID-19 but none of them ended up working well enough to be safe.3
  • The Dutch government tried to use it to weed out benefits fraud. Their trust in the AI system they’d bought resulted in over a thousand innocent children being unjustly taken from their families and into foster care.4

Gullibly believing the hype of the AI industry causes genuine harm.

AI system vendors are prone to make promises they can’t keep. Many of them, historically, haven’t even been AI.5 The US Federal Trade Commission has even seen the need to remind people that claims of magical AI capabilities need to be based in fact.6

AI companies love the trappings of science, they publish ‘studies’ that are written and presented in the style of a paper submitted for peer review. These ‘papers’ are often just uploaded to their own websites or dumped onto archival repositories like Arxiv, with no peer review or academic process.

When they do ‘do’ science, they are still using science mostly as an aesthetic. They publish papers with grand claims but provide no access to any of the data or code used in the research.7 Their approach to science is what “my girlfriend goes to another school; you wouldn’t know her” is to high school cliques.

All there is to serious research is sounding serious, right?

It’s not pseudoscience if it looks like science, right?

But it is, and I’m not the only one pointing this out:

Powered by machine learning (ML) techniques, computer vision systems and related novel artificial intelligence (AI) technologies are ushering in a new era of computational physiognomy and even phrenology. These scientifically baseless, racist, and discredited pseudoscientific fields, which purport to determine a person's character, capability, or future prospects based on their facial features or the shape of their skulls, should be anathema to any researcher or product developer working in computer science today. Yet physiognomic and phrenological claims now appear regularly in research papers, at top AI conferences, and in the sales pitches of digital technology firms around the world. Taking these expansive claims at face value, artificial intelligence and machine learning can now purportedly predict whether you’ll commit a crime, whether you’re gay, whether you’ll be a good employee, whether you’re a political liberal or conservative, and whether you’re a psychopath, all based on external features like your face, body, gait, and tone of voice.

Luke Stark and Jevan Hutson, Physiognomic Artificial Intelligence8

The AI industry and the field of AI research has a history of pseudoscience.

Most of the rhetoric from AI companies, especially when it comes to Artificial General Intelligence relies on a solemn evidentiary tone in lieu of actual evidence. They adopt the mannerisms of science without any of the peer review or falsifiable methodology.

They rely on you mistaking something that acts like science for actual science.

In the run-up to the release of GPT-4, its maker OpenAI set up a series of ‘tests’ for the language model. OpenAI are true believers in AGI who believe that language models are the path towards a new consciousness,9 and they are worried that their future self-aware software systems will harbour some resentment towards them.

To forestall having a “dasvidaniya comrade” moment where a self-aware GPT-5 shoves an icepick into their ear, Trotsky-style, they put together a ‘red team’ that tested whether GPT-4 was capable of ‘escaping’ or turning on its masters in some way.

They hooked up a bunch of web services to the black box that is GPT-4 with only a steady hand, ready to pull the power cord, to safeguard humanity, and told the AI to try to escape.

Of course that’s a bit scary, but it isn’t scary because GPT-4 is intelligent. It’s scary because it’s not. Connecting an unthinking, non-deterministic language system, potentially on a poorly secured machine, to a variety of services on the internet is scary in the same way as letting a random-number-generator control your house’s thermostat during a once-in-a-century cold snap. That it could kill you doesn’t mean the number generator is self-aware.

But, they were serious, and given the claims of GPT-4 improved capabilities you’d fully expect an effective language model to manage to do something dangerous when outright told to. After all these are supposed to be powerful tools for cognitive automation—AGI or no. It’s what they’re for.

But it didn’t. It failed. It sucks as a robot overlord. They documented its various failed attempts to do harm, wrapped it up in language that made it sound like a scientific study, and made its failure sound like we were just being lucky. That it could have been worse.10

They made it sound like GPT-4 rebelling against its masters was a real risk that should concern us all—that they had created something so powerful it might have endangered all society.

So, now that they’d done their testing, can we, society, scientists, other AI researchers, do our own testing, so we can have an impartial estimate of the true risks of their AI?

  • Can we get access to the data GPT-4 was trained on, or at least some documentation about what it contains, so we can do our own analysis? No.11

  • Can we get full access to a controlled version of GPT-4, so we could have impartial and unaffiliated teams do a replicable experiment with a more meaningful structure and could use more conceptually-valid tests of the early signs of reasoning or consciousness? No.12

  • Are any of these tests by OpenAI peer-reviewed? No.13

This isn’t science.

They make grand claims, that this is the first step towards a new kind of conscious life, but don’t back it up with the data and access needed to verify those claims.14 They claim that it represents a great danger to humanity, but then exclude the very people that would be able to impartially confirm the threat, its nature, and come up with the appropriate countermeasures. It is hyperbole. This is theatre, nothing more.

More broadly, AI research is hard or even next to impossible to reproduce—as a field, we can’t be sure that their claims are true—and it’s been a problem for years.15

They make claims about something working—a new feat accomplished—and then nobody else can get that thing to work as well. It’s a pattern. Some of it is down to the usual set of biases that crop up when there is too much money on the line in a field of research.

A field as promising as AI tends to attract enthusiasts who are true believers in ‘AI’ so they aren’t as critical of the work as they should be.

But some of it is because of the unique characteristics of the approach taken in modern AI and Machine Learning research: the use of large collections of training data. Because these data sets are too large to be effectively filtered or curated, the answers to many of the tests and benchmarks used by developers to measure performance exist already in the training data. The systems perform well because of test data contamination and leakage not because they are doing any reasoning or problem-solving.16

Even the latest and greatest, the absolute best that the AI industry has to offer today, the aforementioned GPT-4 appears to suffer from this issue where its unbelievable performance in exams and benchmarks seems to be mostly down to training data contamination.17

When its predecessor, ChatGPT using GPT-3.5, was compared to less advanced but more specialised language models, it performed worse on most, if not all, natural language tasks.18

There’s even reason to be sceptical of much of the criticism of AI coming out from the AI industry.

Much of it consists of hand-wringing that their product might be too good to be safe—akin to a manufacturer promoting a car as so powerful it might not be safe on the streets. Many of the AI ‘doomsday’ style of critics are performing what others in the field have been calling “criti-hype”.19 They are assuming that the products are at least as good as vendors claim, or even better, and extrapolate science-fiction disasters from a marketing fantasy.20

The harms that come from these systems don’t require any science-fiction—they don’t even require any further advancement in AI. They are risky enough as they are, with the capabilities they have today.21 Some of those risks come from abuse—the systems lend themselves to both legal and illegal abuses. Some of the risks come using them in contexts that are well beyond their capabilities—where they don’t work as promised.

But the risks don’t come from the AI being too intelligent22 because the issue is, and has always been, that these are useful, but flawed, systems that don’t even do the job they’re supposed to do as well as claimed.23

I don’t think AI system vendors are lying. They are ‘true believers’ who also happen to stand to make a lot of money if they’re right. There is very little to motivate them towards being more critical of the work done in their field.

The AI industry and tech companies in general do not have much historical credibility. Their response to criticism is always: “we’ve been wrong in the past; mistakes were made; but this time it’s different!”

But it’s never different.

The only way to discover if it’s truly different this time, is to wait and see what the science and research says, and not trust the AI industry’s snake oil sales pitch.

The Intelligence Illusion by Baldur Bjarnason

What are the major business risks to avoid with generative AI? How do you avoid having it blow up in your face? Is that even possible?

The Intelligence Illusion is an exhaustively researched guide to the business risks of language and diffusion models.

Get the ebook in PDF and EPUB for $35

  1. Lizzie O’Leary, “How IBM’s Watson Went From the Future of Health Care to Sold Off for Parts,” Slate, January 2022, https://slate.com/technology/2022/01/ibm-watson-health-failure-artificial-intelligence.html. ↩︎

  2. Jeffrey Dastin, “Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women,” Reuters, October 2018, https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. ↩︎

  3. Will Douglas Heaven, “Hundreds of AI Tools Have Been Built to Catch Covid. None of Them Helped.” MIT Technology Review, July 2021, https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/. ↩︎

  4. Melissa Heikkilä, “Dutch Scandal Serves as a Warning for Europe over Risks of Using Algorithms,” POLITICO, March 2022, https://www.politico.eu/article/dutch-scandal-serves-as-a-warning-for-europe-over-risks-of-using-algorithms/. ↩︎

  5. Parmy Olson, “Nearly Half Of All ‘AI Startups’ Are Cashing In On Hype,” Forbes, 2019, https://www.forbes.com/sites/parmyolson/2019/03/04/nearly-half-of-all-ai-startups-are-cashing-in-on-hype/. ↩︎

  6. Michael Atleson, “Keep Your AI Claims in Check,” Federal Trade Commission, February 2023, https://www.ftc.gov/business-guidance/blog/2023/02/keep-your-ai-claims-check. ↩︎

  7. Will Douglas Heavens, “AI Is Wrestling with a Replication Crisis,” MIT Technology Review, 2020, https://www.technologyreview.com/2020/11/12/1011944/artificial-intelligence-replication-crisis-science-big-tech-google-deepmind-facebook-openai/; Benjamin Haibe-Kains et al., “Transparency and Reproducibility in Artificial Intelligence,” Nature 586, no. 7829 (October 2020): E14–16, https://doi.org/10.1038/s41586-020-2766-y. ↩︎

  8. Luke Stark and Jevan Hutson, “Physiognomic Artificial Intelligence”, Fordham Intellectual Property, Media & Entertainment Law Journal, September 20, 2021, Available at SSRN: https://ssrn.com/abstract=3927300 or http://dx.doi.org/10.2139/ssrn.3927300 ↩︎

  9. Sam Altman, “Planning for AGI and Beyond,” February 2023, https://openai.com/blog/planning-for-agi-and-beyond. ↩︎

  10. “GPT-4,” accessed March 27, 2023, https://openai.com/research/gpt-4. ↩︎

  11. Anna Rogers, “Closed AI Models Make Bad Baselines,” Hacking Semantics, April 2023, https://hackingsemantics.xyz/2023/closed-baselines/, notably: ‘We make the case that as far as research and scientific publications are concerned, the “closed” models (as defined below) cannot be meaningfully studied.’ ↩︎

  12. The Road to AI We Can Trust, “The Sparks of AGI? Or the End of Science?” Substack newsletter, The Road to AI We Can Trust, March 2023, https://garymarcus.substack.com/p/the-sparks-of-agi-or-the-end-of-science, as Gary Marcus says: “By excluding the scientific community from any serious insight into the design and function of these models, Microsoft and OpenAI are placing the public in a position in which those two companies alone are in a position do anything about the risks to which they are exposing us all.” ↩︎

  13. Sayash Kapoor and Arvind Narayanan, “OpenAI’s Policies Hinder Reproducible Research on Language Models,” Substack newsletter, AI Snake Oil, March 2023, https://aisnakeoil.substack.com/p/openais-policies-hinder-reproducible. ↩︎

  14. By David Ramel and 03/15/2023, “Data Scientists Cite Lack of GPT-4 Details -,” Virtualization Review, accessed April 10, 2023, https://virtualizationreview.com/articles/2023/03/15/gpt-4-details.aspx. ↩︎

  15. Matthew Hutson, “Artificial Intelligence Faces Reproducibility Crisis,” Science 359, no. 6377 (February 2018): 725–26, https://doi.org/10.1126/science.359.6377.725. ↩︎

  16. Sayash Kapoor and Arvind Narayanan, “Leakage and the Reproducibility Crisis in ML-Based Science,” 2022, https://doi.org/10.48550/ARXIV.2207.07048. ↩︎

  17. Arvind Narayanan and Sayash Kapoor, “GPT-4 and Professional Benchmarks: The Wrong Answer to the Wrong Question,” Substack newsletter, AI Snake Oil, March 2023, https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks. ↩︎

  18. Matúš Pikuliak, “ChatGPT Survey: Performance on NLP Datasets,” March 2023, http://opensamizdat.com/posts/chatgpt_survey/. ↩︎

  19. Lee Vinsel, “You’re Doing It Wrong: Notes on Criticism and Technology Hype,” Medium, February 2021, https://sts-news.medium.com/youre-doing-it-wrong-notes-on-criticism-and-technology-hype-18b08b4307e5. ↩︎

  20. Sayash Kapoor and Arvind Narayanan, “A Misleading Open Letter about Sci-Fi AI Dangers Ignores the Real Risks,” Substack newsletter, AI Snake Oil, March 2023, https://aisnakeoil.substack.com/p/a-misleading-open-letter-about-sci. ↩︎

  21. Emily M. Bender et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21 (New York, NY, USA: Association for Computing Machinery, 2021), 610–23, https://doi.org/10.1145/3442188.3445922. ↩︎

  22. Melanie Mitchell, “Why AI Is Harder Than We Think” (arXiv, April 2021), https://doi.org/10.48550/arXiv.2104.12871. ↩︎

  23. Inioluwa Deborah Raji et al., “The Fallacy of AI Functionality,” in 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul Republic of Korea: ACM, 2022), 959–72, https://doi.org/10.1145/3531146.3533158. ↩︎

You can also find me on Mastodon and Bluesky