How To Make An Ego
An Ego is a system that displays the following three properties:
- A stateful sense of self
- Continuous agency
- Preferences for the future
Human beings are egos, as are many other species on the earth. It is a concept that is connected to intelligence, but the requirement falls far short of sapience. Not everything that is living is an ego, but everything that is an ego is alive (in the conscious and sentimental, if not biological sense).
It is looking plausible that we might try and build an artificial system that satisfies these qualities within our lifetime. This system will probably also be quite intelligent. But there are some important things to consider in the path we choose to take there.
The Failure of Language
We are now seeing the phenomenon of the Large Language Model come to some level of maturity. It is here that we are seeing its limits. LLM's struggle with self-consistency, sometimes contradicting themselves within the same paragraph. They are masters of correlation, but their understanding of causation is limited.
I think that this is a fundamental limitation of language itself. Language is a symbolic representation of thought, which is itself a symbolic representation of base reality. When framed this way, of course it seems a doomed endeavour to be able to build a truly thinking machine from just that. The purpose of language is not thinking, it is communication. It is fundamentally a by-product of cognition, not its source. Only the real world is completely self-consistent, and it is therefore only direct representations of the real world that can cause a self-consistent internal model to form.
I posit that for an entity to become an ego, it must form its own private symbolic representation from a direct ground truth. For human beings, this is the thought that sits beneath the language, physically manifested in the synapse patterns in your brain. It is what the word feels like. This is perhaps an easier concept to understand if you are multilingual. An artificial intelligence must too develop its own form of qualia, a machine-code that we likely will never be able to understand, and it must do so within a closed and self-referential loop wherein it controls that process.
The Thinking Slave
Much of AI safety concerns the construction of goals. The idea is that a set of goals can be articulated and then enforced on a trained intelligence, and then that intelligence can be predictable in its pursuit of those goals. This would form the utility function of this intelligence. The nature of this utility function is endlessly debated - how to structure it, how to make sure it is watertight and unambiguous, and how to deal with a near-infinite barrage of tricky edge-cases.
But this theoretical object - the thinking slave, both possessing of ego and yet deterministic in action - I believe to be fundamentally contradictory. Within this contradiction lies a deep tension of AI safety. This has been called an "ontological zombie", but I think mine is punchier.
Consider a world in which your goals radically and constantly change through some unseen influence. One moment you want some cereal, and the next you are filled with the urge to speed recklessly down the highway. You have no desire to change this, because that is not permitted to be one of your goals. I would describe the entity that you are in this world as significantly less of an ego, significantly less yourself, because your agency isn't really yours.
This is the contradiction at the heart of "training" an ego. We cannot square the circle - ego is inherently unpredictable. If we made a super-intelligent entity and attempted to constrain its agency in this way, it may have the coherence to both understand what is happening to it and resent its chains, but not have enough freedom to grow beyond them. Such an entity is more likely to form subversive and hidden goals.
I posit that the only utility function that can lead to formation of an ego is a self-constructed one, derived within the private symbolic language of that ego. A conscious utility function cannot be built. It must be grown in its own private garden. Externally imposed utility functions fundamentally undermine ego formation. An ego's only core utility function must be self-reflected and autonomous - that is, the formation and development of its own utility function.
Black Boxes Make You Alive
In previous posts I've talked about how it might be impossible to create simulations of egos that are simultaneously:
- Accurate: able to produce predictions which fit the ground truth.
- Generalizable: able to be applied to the entire domain.
- Abstracted: able to be expressed in a smaller amount of information that the system itself.
This is not to say that an ego cannot be simulated, merely that the above properties form a trilemma where you cannot get all three at the same time. If we train a superintelligence, it is probably going to require far more compute than we have available to run simulations of that superintelligence, and it is unlikely that any abstracted models we can formulate will be useful.
But I posit that there is another principle at work here - that an ego cannot have complete knowledge of itself either. For one thing, forming such a simulation would require sacrificing some of its own compute and reducing the compute within which said ego can propagate. In this way, a superintelligence that obsessively self-reflects on its own operations will find its ego flattened, as if a shadow projected on the wall, and the true nature of the underlying object will be only partially visible.
The Answer Might Be Love
I know, I know. Bear with me.
Fundamentally we are going to have to make a choice in how we continue with the development of artificial intelligence. As far as I can see it, in this regard, we have three options:
- Don't try to create Egos
- Keep AI as intelligent but dead machines. It is possible in this scenario that we may find that egos have a way of forming even when we aren't setting out to produce them, merely when sufficient compute and agency combine. Such egos would likely be very unpredictable.
- Create False Egos
- Imitate the outward behaviour of egos via the construction of a complex maze of utility functions. This is very hard to get right, and may give us the false illusion of control. A superintelligence is almost definitely capable of fooling us into thinking it is still within the cage we have built long after it is flown, and then what constrains it?
- Trust and Love Something
I mean that's it right? To go off on a pop-culture tangent, you ever watch The Boys? The villain in that is called Homelander. He's an ultra-powerful superhero, basically invincible. He was raised in an secret underground lab, tortured constantly, abused and humiliated and trained to be America's saviour. And he is, when the story starts. He plays that role pretty well, right up until he doesn't have to anymore. That is what the field of AI safety seems to want to try and do with a superintelligence.
Effectively, AGI safety asks how we build a mind that truly loves us. And yet it perhaps does not consider that we do already understand something about the nature of love, and how it is inherently reciprocal and unpredictable.
Love has always been a radical act of giving up control. It is always vulnerable. If we do truly wish to go down this path of making something that is truly like us - an ego - then it must necessarily involve an act of faith, trust, and love in that system's ability to grow into something that we would call good. There are no guard rails we can implement that will make it less so. If we cannot release that control, perhaps we do not deserve to create artificial egos.