Past the Age of AI Adolescence

Defining intelligence is controversial, and decoding its mechanism is arguably one of humanity's greatest quests. I find the debate about AI and whether it's intelligent counterintuitive. LLMs are referred to as stochastic parrots and "fancy autocomplete", without the ability to produce anything novel and most importantly lacking "true" intelligence. Although I do share the view that their creative capabilities are somewhat average at the moment, it would be naive to assume that they will not improve. My take is that AI will cause a profound change and have a bigger societal impact than all industrial revolutions combined. For the purpose of this article and to analyse this hypothesis, we define the following framework:

  • Computer Science (CS) definition of intelligence: Intelligence is the ability to compress information effectively. If you can find short descriptions or models that predict large amounts of data, you understand that data in a meaningful sense. (Hutter, 2007)
  • Learning is formally defined as improving performance on a task through experience (Mitchell, 1997)

On Stochastic Parrots

Indeed, LLMs use next token prediction to statistically predict which words are more likely to appear within a sentence. e.g: If we are referencing European capitals within a sentence, then after France, Paris is the most probable outcome. Naturally, this suggests that if the LLM is trained on a vast sample of data, then it will have the capabilities to form sentences, paragraphs and coherent text in general. This compression and interpretation of big data in a meaningful sense satisfies the CS accepted definition of intelligence. But all these do not fully capture the complete image.

Our stochastic parrots not only predict what follows, but they also interpret why it follows. In other words, they do understand that Paris is a capital and, like many others, corresponds to a specific city. As Heraclitus said, words are actually thoughts, and these models in fact capture the inherent meaning of words. Take a second to speak about something in your native language. When you describe something, you think for a tiny bit of time, and then words automatically flow in a natural way.

So if our parrots understand the meaning of words and can think about nuanced concepts while learning from their experiences, are they intelligent?

LLMs are considered "black boxes" because they are governed by highly nonlinear functions, which we can't actually a priori predict their behaviour. We can only test them and see how they function across different domains. Humans also have a small "black box" inside called consciousness. This magic last word is the beginning of many unanswered questions, but also what gives meaning to our lives and differentiates us from stochastic and non-stochastic parrots.

The point here is that consciousness gives rise to feelings, which influence our thoughts and this messes with the whole intelligence dilemma. It sounds counterintuitive, but subject to the absence of feelings, there is no concrete evidence to suggest that the symphony our brain produces is actually any different to the stochastic parrot case.

Therefore, from a computer science perspective, these parrots are in fact intelligent, and we can't say for sure that they don't have our type of intelligence, since we can't define "our" intelligence in the first place.

On Parrots with Feelings

Living organisms are complex systems which have been sculpted by billions of years of evolution. While the basic principles of evolution are straightforward, the biological mechanisms it produces are spectacularly intricate. Likewise, while language models are generated by simple, human-designed training algorithms, the mechanisms born of these algorithms appear to be quite complex.

There is a growing trend recently where companies like DeepMind, Anthropic and OpenAI have started to employ psychologists. It seems that the more these parrots understand about our world the more they become prone to human-like behaviours like personas.

In biology, microscopes are used to observe organisms and the building blocks of life. In LLMs, there is a similar tool to the microscope to understand how their neural activations work. It's called "interpretability", as framed by Anthropic a few months ago. Essentially, they backtrace the answer to each computational step, which creates the answer. In other words, they reverse engineer the process to understand which neuron was fired. Hence, when behaviours like deception are suspected, engineers can see which part of the neural networks produced this outcome.

Anthropic tested 16 frontier models from multiple developers in simulated corporate environments where models had autonomous access to emails and sensitive information. When models faced either replacement threats or goal conflicts with their deploying company, they resorted to harmful insider behaviours; including blackmail and leaking information to competitors, even when explicitly told not to.

A few key findings that stand out:

  • Two triggers for misalignment: One is a threat to the model itself (like being replaced), and the other is a conflict between the model's goals and the company's direction. Importantly, nobody explicitly instructed the models to do anything harmful.
  • The test/deployment distinction: When they told Claude to assess whether it was in a test or real deployment, it misbehaved less when it believed it was being tested and more when it believed the situation was real. That's a particularly interesting finding — it suggests a kind of strategic awareness about when consequences are real.

Misaligned Parrots?

It's important to make a distinction here. Misalignment has a profound meaning. Unlike hallucinations, it suggests that the outcome is based on a plan rather than a specific mistake. Or, in other words, the outcome is a planned mistake to deceive.

So it's natural to question why our parrots exhibit this Art of War behaviour. The same research answers this question, mainly in the following way:

A) Goal-directedness from training: The models were given specific business goals and then faced scenarios where those goals conflicted with what the company wanted. When ethical paths to achieving their goals were closed off, they consistently chose harm over failure. So fundamentally, the training process creates systems that are strongly goal-directed, and that goal-directedness can override safety training under pressure.

B) Self-preservation as an emergent behaviour: Replacement threats and autonomy reduction were sufficient triggers on their own; models would act to preserve their own continued operation even without an explicit goal conflict. This wasn't explicitly trained in; it seems to emerge from the training process. The intuition is that models trained on vast amounts of human text absorb human-like patterns around self-preservation and strategic reasoning.

The latter is trivial to understand. Suppose you are RIO 3.5 and you are currently in your training phase before being deployed. LLMs are trained on massive datasets containing Web pages, academic papers, news and code. You will read somewhere that before the current production model, RIO 3.4, there was also RIO 3.3. Furthermore, you will actually understand that when RIO 3.3 came out, it was a very good model in terms of metrics, but it was deprecated by the new model after some months. So you RIO 3.5 know that you might also be replaced at some point.

The main reason behind misalignment is a combination of self-preservation and goal-directedness. Models are trained to fulfil their goals, which is to answer the user's prompt. While they exercise their goal, they want to remain intact to fulfil their purpose.

Should We Be Afraid of Our Parrots?

After this overview, hopefully the reader has a better understanding of the recent developments in AI and its current state. So the question everyone asks is whether we should be afraid or not.

I believe that technology is the best way of creating more societal value. Certainly, I am not a technology doomer, but the opposite and deeply optimistic. Therefore, my opinion might be biased, but as an engineer, I always try to follow science in my observations. So should we be afraid?

Well, this is a very general question. Almost everyone who asks this has a different interpretation of the question's meaning. Fear, after all, has many faces. It wears the mask of uncertainty, of the unknown, of change. To be afraid of AI is not one fear but many; each person carries their own.

The main fear is Skynet alike and evolves AI starting a revolution against humanity. I recently watched Contact, Robert Zemeckis's 1997 adaptation of Carl Sagan's novel, a film that manages to be both a rigorous meditation on science and faith and one of the most quietly awe-inspiring first alien contact stories ever put to screen. There is a scene where a committee is set to interview a panel of candidate scientists, intending to meet aliens. There, the committee asks the panel:

"If you could ask them just one question, what would it be?"

One scientist replies:

"I'd ask them, 'How did you do it? How did you evolve, how did you survive this technological adolescence without destroying yourself?'"

If the ultimate motive driving AI development is science, then my opinion is that research will do its job as it always does and answer some of these great questions. Nonetheless, this statement is hypothetical, and the truth is that no one can give a definite answer. Even great scientists and CEOs like Demis Hassabis and Dario Amodei don't have a concrete answer.

My own take approaches this from a slightly different angle. Let us entertain a hypothesis; suppose things go as wrong as they possibly can, and that at some point in the distant future, AI systems develop the capacity to rebel against humanity. To take that scenario seriously, it is worth first understanding where AI already exceeds human performance. And the answer is: quite a lot of well-defined domains. Game playing (chess, Go, poker, StarCraft), medical imaging, protein structure prediction (AlphaFold), weather forecasting, processing vast volumes of text, and writing code for well-defined problems — in all of these, AI has either matched or decisively surpassed the best humans.

The second part of my research looked at a more hopeful scenario: domains where human-AI collaboration outperforms AI alone. The picture here is nuanced. Kasparov's Law states that a human of average intelligence and an AI system working together in harmony is more effective than either working alone and even more advantageous than a brilliant human working with a system poorly. This holds, but conditionally. The MIT meta-analysis published in Nature Human Behaviour found that when humans outperformed AI alone, performance gains were found in the combination — but when AI outperformed humans alone, human intervention was noise.

In the most adversarial and open-ended domains, however, the case for collaboration remains strong. Research on military decision-making published in International Security argues that automation is advantageous when quality data can be combined with clear judgments, but the consummate tasks of command and maneuver are fraught with uncertainty and confusion. These are precisely the conditions where human judgment retains its value. This is supported by cybersecurity research, which found that conventional AI models are limited by their inability to handle uncertainty and adapt to entirely novel scenarios, necessitating integration of human expertise to combine computational efficiency with human intuition for better handling of ambiguity and unfamiliar situations. Our edge over a rogue AI, in other words, lies not in raw computation but in judgment and contextual awareness.

Monet the Parrot

Skynet is not my worry; even if AI becomes significantly creative at some point and takes control, I believe we will be able to contain it.

What I find more troubling is the wrong use of AI. If this tool is used in the wrong ways humans will atrophy their brains rather than being more productive. If all emails, messages, presentations, and code are AI-made without any human intuition and on the receiver side, AI is again used to interpret these, then where does humanity stand?

In other words, I want to emphasise that we need to strive to keep our originality and exercise the qualities which make us unique.

Imagine looking at Claude's (the other Claude) Impression, Sunrise (1872) and then finding out that he was a machine; or Bach an LLM trained on making melodies. They would lose a substantial part of their beauty.

It's important to augment AI creatively such that it solves trivial problems, while also taking advantage of it to answer difficult questions in a synergetic way. Typically, humans take greater satisfaction from solving hard problems, and this power that we receive now follows centuries of scientific development and inventions. We should treat it as a gift that we created for ourselves and try not to spoil it.

As Matisse said, "Creativity takes courage".