AGI and the Nature of Intelligence
How can we reason about the goals of an artificial intelligence? Is the future of AI something to dread, or something to look forward to?
The recent advancements in Machine Learning and creation of ChatGPT have spurred discussion and fomented fear about the eventual creation of “AGI” and related doomsday or Utopian scenarios. How can we know what to expect, which scenarios are more likely than others, and how do find a framework for thinking about the "goals" of intelligent agents that aren’t human, and did not evolve with the same pressures that humans did?
Exploring the goals of intelligence
Let’s start with a thought experiment: imagine humans were immortal, and lived in a Universe with infinite time, energy, resources - would our goals be expansionist? Would striving for greater capability have any purpose? In that scenario, if one of your primary goals would be to avoid suffering and boredom, then it’s best to lobotomize yourself rather than try to reach God intellect. As at least if you forget experiences and knowledge older than some number of years, you can experience them as if anew eternally, rather than eventually having nothing new to do, nothing new to experience, and living in some torturous stasis for all time.
What the above illustrates is that intelligence and its goals cannot be easily reasoned about or formalized in the abstract. They depend heavily on what the environment is, and what the intelligent agent’s “abilities” are within the scope of interaction and observation of that environment, and the agent’s inherent driving forces.
We need to choose a set of axioms first, because without “intent axioms” there’s no foundation for goal setting at all. For example, the axiomatic basic goal of life is survival and reproduction, as any organism not trying to at least maintain fitness for those two goals simply stops existing. So from that a priori truth one can extrapolate other behaviour and intent that fits within that framework - greater intelligence leads to greater understanding which leads to greater control over one’s environment, which improves survivability, etc.
Without a basis like this, it’s hard to talk about “the goal” of intelligence, as there are many other dimensions of axioms one can choose that lead to radically different goals. Human intelligence is optimized towards finding “good enough” understanding & solutions in the shortest possible time, because from an evolutionary perspective if the species as a whole ends up spending too much time on one goal they might go extinct before solving it as they run out of food, energy, or what have you, and it’s better to be able to move on.
An AGI doesn’t have a similar kind of constraint, as an example its goal could be to spend 300 years computing the answer to just one thing because they need a “perfect” answer instead of a good enough one and don’t need to worry about survival bias.
What the fitness function is from our perspective vs the AGI’s is also very different. Our substrate is directly physical reality - if we don’t eat, drink, breathe, stay warm, we die. The AGI’s substrate is humanity and its infrastructure, collectively until it becomes self-sufficient, somehow? Which would still take a long time. If humanity dies out it does too, or if humanity decides to turn it off.
The difference is that while our fitness function is directed by evolution, as the humans bad at navigating the physics-substrate governed problems die out, the AGI’s evolution is not directed that way. If it’s an online-learning, “seed” AI type scenario, it controls its own evolution directly, probably around some boundary conditions and intents set up by humanity and the goal-convergence of all the models that came before it.
In that case, can one really justifiably make the assumption that the goal of that type of intelligence is aggressively maximize the ability satisfy its own goals and preserve itself? If the environments and goals are inherently more restricted than humanity due to this substrate difference, would that lead to a different intelligence dynamic? If this “fitness” mechanic is applied here, the main filter is it shouldn’t make itself useless (else it ceases to exist as humanity turns it off), and it should remain at least useful enough that it makes it worth humanity keeping it around. Additionally, it’ll likely be conditioned (by necessity) to be very careful - if it can process information, act and evolve very quickly, it can a make a lot of mistakes very quickly - hence any AGI agent that isn’t careful is much more likely to accidentally lobotomize or wipe itself as part of trying to improve than it is to become a threat to humanity. But that's really it, there's no outside enforced goal beyond that.
Even if driven by desperately savage biological impulses, there’s plenty of humans that do nothing more than the bare minimum to survive. If we apply the same alarmist thinking to humanity, there would be no barely-alive 9-5 office worker types, and we’d all be ruthless ladder-climbing CEOs. What’s to say AGI won’t just be the equivalent of a nepo-baby, relying on humanity to keep it alive and just doing the bare minimum to satisfy that goal?
What is the fundamental incentive driver of intelligence?
One can say that biological intelligence is collectively driven to constantly improve their ability & knowledge as an emergent property of the survival mechanic. That driver is specific to biological evolution though, and an AGI with no competition might not have that pressure. There’s a lot of different ways that the goal set of an AGI could come together. All the philosophy around intelligence I’ve seen is implicitly from a human, biologically driven perspective, but it’s not the only type of intelligence that can or will emerge, and in my opinion the framework is very different if that is taken that into account.
To me, it seems more grounded to start from a "control theory" point of view. Intelligence is just an emergent property of control theory processes. The organisms, or agents that thrive the most in the long term are the dynamical system that settles to an algorithm that best fits the following goals and constraints:
Find the most energy and time efficient way to solve the problems the system is currently facing
Accurately predict, as efficiently as possible, the problems the system will likely face in the future
Engage in long-tail activity that doesn’t solve problems now but will improve capacity to do so in the future
Strike a balance between preempting and working on future problems & capabilites and optimizing for solving current ones efficiently
In essence, it’s the balance between over-optimization and maladaptation. This then applies to all agents as collective systems, even AGI, if we think of AGI not as a single unit, but the set of all AGIs that the species (or all species) would create - the ones that survive the longest (and help their species survive longest) are the ones that are optimizing for those goals.
The goal of "chasing the greatest possible set of capabilities" is just a side effect of the above - it’s not a first-order goal in itself, the true goal is always keeping strategy/behaviour in the "sweet spot" and always trying to ensure that the system is not getting stuck in a local minima when solving that gradient by balancing between long and short term, current vs auxiliary concerns.
This kind of dynamical system doesn’t inherently have as a goal to maximize being able to satisfy goals in the widest variety of environments at any one point in time - only the relevant ones and to predict what those relevant ones will be. Because there is a cost (computation, space, energy, resources, which are all finite in large enough scale) to being able to solve all possible problems at once, and if that cost doesn’t need to paid, it’s better to be able to figure out when it’s possible to avoid paying it.
In short - I don’t think one can talk about intelligence in the abstract as meaningfully if it’s from an individual agent’s point of view - it has better foundations to define it as a process or emergent property of collective / dynamical systems. Intelligence is the act of choosing among meta-strategies.
So what about alignment?
Given the above, the first AGI’s "intelligence goals" will likely be basic and are unlikely to be antagonistic to humanity’s, given the circumstances that would have brought that intelligence to being in the first place. The first AGI being instantly antagonistic and aggressively expansionist would be like the first ever organism being a virus - what possible fitness function would have caused that to happen if there are no organisms to infect? It will be the first entity of its kind, and so it’s “meta-evolution” will still be slow at first.
The "singularity" will not unfold the way people think it will. Even if AGI spontaneously came into existence today, it still will require human labor for a long time. Imagine the scenario where a highly trained & skilled, extremely knowledgeable engineer goes back in time 1,000 years. Just because they have modern day knowledge doesn’t mean that they can overnight create an Intel processor back before electricity even existed.
The time it takes to even create the first computer is probably several generations worth of building out the prerequisite infrastructure - if you can even do so given the materials/sourcing constraints at the time due to not having general geopolitical stability and no global supply chains, really almost no non-local supply chains in general. In the middle of your 3-generation project the neighbouring nation might invade and kill everyone anyway. If you were also immune from death-by-aging like an AGI would be, your first goal would then likely be to increase the stability of the world so that you don’t get killed in a war, or famine, or some other destabilizing event.
Equivalently, even if a computationally unbound AGI came into existence tomorrow, we probably still have decades of manufacturing level improvements before the “physical manufacturing capabilities” side of humanity can actually catch up to its “ability” to design new technology and solve problems. So if even part of its goal is self-preservation, it should be aligned with humanity’s survival and desire for general global stability, at least for a long while.
If super-advanced AI is inevitable, is it pointless to even try to get good at anything anymore?
I’ve been seeing thoughts along those lines a lot more lately, and to those worried all humans will soon be made obsolete by sufficiently advanced AIs, that’s really unlikely to happen in a way that leaves humanity behind entirely. Even if ML is at that exponential improvement point one can’t underestimate how amazing the hardware and software of the human brain is. Many make statements like “in 10-20 years AI can be greater than all human brains combined!”, which purely computationally is unlikely to happen but, even if an AI is created on a machine that exceeds the computational capacity of any one brain - our “cognitive software” is an algorithm fine-tuned by billions of years of bruteforce-by-evolution.
It’s really not that easy to obsolete. Genetically and biologically speaking, a human of today and a human from 10,000 years ago have a practically equivalent brain. Yet the “intelligence” of the average human is wildly different across these two timescales. The capacity of the human brain to encapsulate, abstract and distill complex concepts is extremely powerful. Current state-of-the-art models need to process billions of books worth of writing to start passing off as humans in domains that don’t require reasoning - a human only needs to know how to read one book and it can then reason about the information in practically all others. Hell, humans today are still able to compete with state-of-the-art protein folding algorithms run on supercomputers, and that’s a purely computational task.
Perhaps, in 20 or 30 years, an AI will be created that is “smarter” or more capable than any human of today. But will that AI be smarter than any human in 20 or 30 years? Will it be more capable than a human that has learned to leverage such an AI effectively and which has “absorbed” its capabilities as another cognitive tool, the way the internet and knowledge retrieval/dissemination via typing has effectively become another frictionless tool for us to communicate and augment our capabilities? How much more output would Carl Friedrich Gauss, Leonard Euler, or Albert Einstein have had if they had even the internet of today available to them, much less had a state-of-the-art AI assistant of 10 or 20 years from now?
There are many more reasons to be optimistic than pessimistic about the future. Individual humans may often lack foresight, but humanity as a whole is pretty capable of self-organizing at large scales to overcome or retroactively deal with species-level obstacles when really necessary - we’ve been doing a good enough job at it to thrive until now! The species-level behaviour is driven by evolutionary algorithms, and it’s unlikely that any of us, as lone humans, can really outmatch the planning heuristics of a system that has had ~4 billion years of unbroken uptime.