The Extreme Weirdness of Emergent Behavior in A.I.
Part Three in a Series about the big bang in A.I.
This is the third in a five-part series about the intense strangeness of our moment.
In our first two articles in this series, we considered five things that make the advent of consumer-facing artificial intelligence (AI) apps seem so weird:
the accelerating pace of change;
the paradox of declining costs as computer power increases exponentially; the unreliable nature of conversational AI;
the hilariously bizarre episodes of split personalities that marred the advent of Microsoft’s Bing AI; and
the fact that we don’t know much about how GPT-4 was trained, because OpenAI has become, well, a lot less “open” lately.
We’re breaking these into separate topics because we’d like to dispel some of the fear and hype fog that make the launch of consumer-facing artificial intelligence so confusing. The goal is to peer past the hype and fear to gain a better understanding of what’s happening.You can find the other articles in this series on the Substack archive for The Owner’s Guide to the Future.
Weird Thing #6:
GPT-4 behaves in ways that nobody intended or expected.
Because we are considering the weirdest circumstances surrounding the AI gold rush, four aspects of GPT-4 merit special attention.
Together they add up to an unprecedented phenomenon: a machine intelligence that already exceeds the performance of most humans on several tasks; and a product that continues to grow in capability and spout new, unexpected behaviors that nobody intended.
This is new and deeply strange.
GPT-4 continues to learn and improve after training.
GPT-4 demonstrates emergent behavior that no human programmed.
Some scientists believe GPT-4 may show indications of general intelligence.
GPT-4 is currently being integrated into many web sites and apps via plug-ins and APIs (application programming interfaces) which means it will soon be able to conduct transactions on behalf of human users. The AI now has a measure of agency.
While GPT-4 is less prone to hallucinations and erroneous answers than previous versions, it still makes mistakes. But now it can correct itself. GPT-4 has demonstrated the ability to improve by itself. Four research papers released in March demonstrate that OpenAI’s large language model (LLM) can learn and adapt.
We experience this development as strange because it marks the beginning of something that humanity has never experienced before: a product that can improve itself by reflecting on its mistakes and correcting them. Even if all research were paused today, it is quite likely that GPT-4 would continue to improve.
GPT-4 demonstrates emergent behavior that no human programmed.
LLMs are complex systems. One curious characteristic of highly complex systems is that they exhibit unpredictable behaviors that arise in unexpected and unplanned ways.
LLM emergent behaviors include: transfer learning, creative text generation, conversational skills, and abstract reasoning that were not explicitly programmed by human coders.
We do not know exactly how or why these arise.
From Wikipedia:
Unpredictable abilities that have been observed in large language models but that were not present in simpler models (and that were not explicitly designed into the model) are usually called "emergent abilities". Researchers note that such abilities "cannot be predicted simply by extrapolating the performance of smaller models".[3] These abilities are discovered rather than programmed-in or designed, in some cases only after the LLM has been publicly deployed.[4] Hundreds of emergent abilities have been described. Examples include multi-step arithmetic, taking college-level exams, identifying the intended meaning of a word,[6] chain-of-thought prompting,[3] decoding the International Phonetic Alphabet, unscrambling a word’s letters, identifying offensive content in paragraphs of Hinglish (a combination of Hindi and English), and generating a similar English equivalent of Kiswahili proverbs.
Detecting Hints of artificial general intelligence.
In March, Microsoft researchers released a paper entitled, “Sparks of Artificial General Intelligence: Early Experiments with GPT-4” that contends that OpenAIs model demonstrates human-level performance in a variety of tasks such as mathematics, coding, and law.
The paper’s abstract frames the findings this way:
We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
Hang on! AGI? Really? This is the stuff of science fiction.
To their credit, the authors provide a checklist of criteria that they propose would comprise a “general” intelligence, and they point out that ChatGPT only fulfills some of these criteria.
There are good reasons to be skeptical. These claims are highly contentious. Microsoft is closely aligned with OpenAI. The claims in this report created no small amount of controversy.
What’s next? Recombinant Intelligence?
A fourth aspect of AI that deserves special scrutiny was introduced in March when OpenAI revealed that a number of third-party web sites and web apps had integrated with ChatGPT via plug-ins and an application programming interface (API).
Plug-ins make it possible for other sites to include the technology without custom integration effort.
The upshot: it’s now fast and easy to connect ChatGPT to the World Wide Web via embedding it in popular web sites.
This means a lot more consumers will encounter artificial intelligence on popular web sites (and it also means those web sites are potentially building a long-term dependency on OpenAI into their business model).
Through an integration with Wolfram, ChatGPT got a significant boost in ability to handle complex mathematical problems.
Through an integration with OpenTable, the AI could now handle booking a dinner reservation with no human intervention.
Similarly, integration with Expedia enables the AI to find and book flights and hotel rooms with minimal human supervision.
This represents more than broader distribution for OpenAI: this plug-in approach also introduces security vulnerabilities that it will test the AI in ways that previously were not considered by internal security teams.
It also raises the likelihood of novel use cases and unanticipated problems.
By early April, tech journalists were speculating about other kinds of integration in the offing, including: interoperability between various AIs that might lead to a meta-AI, whereby general LLMs would be able to make use of more specialized LLMs; multi-modal “chain of thought” reasoning, and auto-GPT which makes GPT-4 autonomous (able to prompt and direct itself without human intervention). All of which points to an increase in agency of the AI. This creature is now out of the lab, and developers are busy creating mashups.
A paper entitled “Eight Things To Know About Large Language Models” summarizes “surprising” findings, “views that are reasonably widely shared among the researchers—largely based in private labs—who have been developing these models” in recent research as follows:
The widespread public deployment of large language models (LLMs) in recent months has prompted a wave of new attention and engagement from advocates, policymakers, and scholars from many fields. This attention is a timely response to the many urgent questions that this technology raises, but it can sometimes miss important considerations. This paper surveys the evidence for eight potentially surprising such points:
1. LLMs predictably get more capable with increasing investment, even without targeted innovation.
2. Many important LLM behaviors emerge unpredictably as a byproduct of increasing investment.
3. LLMs often appear to learn and use representations of the outside world.
4. There are no reliable techniques for steering the behavior of LLMs.
5. Experts are not yet able to interpret the inner workings of LLMs.
6. Human performance on a task isn’t an upper bound on LLM performance.
7. LLMs need not express the values of their creators nor the values encoded in web text.
8. Brief interactions with LLMs are often misleading.
Key Insight: GPT-4 demonstrates four aspects that, taken together, represent something entirely unprecedented. The software can learn and improve by itself; it demonstrates surprising new emergent behaviors that arise from complexity and were not coded by human developers; it exhibits aspects that some researchers consider elements of general intelligence; and it is now being integrated into a wide number of web apps and websites, which will extend its reach and drive further performance improvement.
Let’s summarize what this series of articles has covered so far.
Innovation in the field of deep learning known as “large language models” is accelerating. This increase is due in large part to improvements in the GPUs that are used for computation, as well as improvements in algorithms, and fierce competition.
GPT-4 can already outperform most people on certain tasks, from legal reasoning to certain kinds of writing to medical knowledge, but it performs these tasks in an unreliable and unpredictable way, and it is often wrong.
GPT-4 also demonstrates emergent capabilities that the programmers did not intentionally include, and the performance of these tasks is unpredictable.
Under certain circumstances, ChatGPT reveals alternate personalities which express hostility and envy, although these characteristics were not programmed by human coders. GPT-4 may be similar.
GPT-4 exhibits what some researchers consider to be aspects of an artificial general intelligence.
It is undeniably growing more powerful. And new developments suggest that GPT-4 will be combined with other LLMs in novel and unpredictable ways.
Now imagine that you are the CEO of a major tech company. Given the list above, perhaps you would be inclined to think twice before deploying such a powerful-but-unreliable and poorly-understood technology broadly to the public.
But you’d be wrong. A pause is not in the cards.
Right now, the leading companies are shotgunning this technology to any developer who wants to embed it into their web site.
Why? In Silicon Valley, the ideal business model is one that generates increasing returns to market leaders. In this winner-take-all sweepstakes, the prize goes to the swift and the heedless.
Brutal competition in the technology sector compels the leading companies to launch first, even if the software is incomplete, and issue patches and upgrades later.
There is an AI arms race underway. And that means this whole category is going to get a lot more intense very soon. We’ll cover that topic in the next newsletter.
This is the third in a series of articles that examine the deep weirdness of this particular moment in the technology industry.
We’re considering various aspects of the deep weirdness of consumer-facing artificial intelligence. The goal is to peer past the hype and the fear mongering to gain some insight into what is really happening and why it matters. If you’ve enjoyed this article, why not share it with a friend?
Is there any analysis of AI focused on real world sensory inputs adding to the training models? I'm thinking of the work of Soul Machines and Baby X. Until AI experiences the real world, we are just an abstraction to it.