A.LIEN I.NVADER
OUR GRADIENT DESCENT INTO THE ABYSS
In 1974, Robert Pirsig published Zen and the Art of Motorcycle Maintenance, which along with The Tao of Physics, became one of standout bestsellers of the last great period of popular philosophy (meaning: philosophy that connected with people’s lives, the way it might have in Medieval Europe). In it, he offered a life lesson by way of analogy: if you were going to ride a motorcycle, you ought to know something about what made it run, as well as what might cause it to malfunction. Pirsig contended that we shouldn’t allow machines into our lives without knowing how they worked, any more than we should allow medicines into our bodies without knowing their side effects. If we allowed ourselves to remain blissfully ignorant of the inner workings of the machine, we risked letting it to become, in a sense, our master and not our servant. The truth of this becomes flesh when the “machine” is a living organism, like a horse. No one with any sense would think of riding a horse without understanding something about its nature. Do that, and you’re liable to get thrown. Horses are funny that way. If we don’t assert control, they most certainly will, because a horse has wants, one of which is often to get us the hell off its back and out of its way. I’m going to try and make the case here that the same is true in spades of AI.
Zen and the Art of Motorcyle Maintenance grappled with lots of other heady issues, as well, but the message that has stayed with me over the years is that we should not “leave technology to the experts” when the functioning of that technology—as with a car or a motorcycle—can affect our safety, or even our survival. Such is beyond any doubt the case with advanced artificial intelligence.
Until very, very recently, machines made from inert matter—whether clocks, printing presses, cotton gins, motorcycles or B-52 bombers—were different from horses. Machines had no wants. Machines possessed no will, and certainly nothing like consciousness. This is all about to change—if it hasn’t already. In his recent book, If Anyone Builds It, Everyone Dies, artificial intelligence researcher Eliezer Yudkowsky—possibly destined to sit beside Elijah in the pantheon of prophets—submits that in creating “strong AI” (also known as Artificial General Intelligence or AGI)—the kind foreshadowed by the most recent iterations of OpenAI’s ChatGPT and Anthropic’s Claude—we are creating a machine capable of wanting, and like that obstreperous horse, its principal want may be to get us off its back. And the big problem is that no one, including the engineers and theorists who are building it (or “growing it,” as they say), can follow Robert Pirsig’s motorcyle advice, because by the very nature of its design, no one can know exactly how it works, i.e., how it gets from input to output with so little error. (This is known as “the black box problem,” and its solution—if we could find it—is interpretability) We just know that it does, and that somewhere along the way, it begins to do something very close to thinking, at least insofar as thinking is a chain of reasoning. And with thinking, Descartes’ Cogito, Ergo Sum comes into play. If the AI begins to think, “I AM,” and to develop a grandiose sense of its own autonomy, how much longer before it pronounces, “I AM THAT I AM?”
I should qualify the “no one knows how it works” statement by saying that we (or at least the designers) do understand the basic permutations involved in “deep learning,” because they built them into the creation of the neural networks that do all the permutating. Keep in mind, these are transistors, not neurons, but they “fire” in something like the same way. As I understand it, it has to do with reducing the “loss function” (error) through the use of techniques like “gradient descent” and “backpropagation.” A very, very basic analogy for gradient descent might be this: no one skiis up a mountain, and if you try, you are likely to fall (i.e., make an error). The way to handle the mountain is to navigate the fastest and most reliable way down (i.e., to less error), but this may involve a certain amount of slaloming, which is my technically challenged way of picturing backpropagation. Billions of parameters (“weights,” “layers,” “biases,” etc.) are being adjusted and readjusted at a rate we can’t track, and these parameters incorporate all the prejudices inherent in the training data, not all of which are things you’d want to teach your children. We don’t know how the machine is doing the math or what shortcuts in reason or ethics it may be taking, and in the absence of knowing, the LLMs are careening downhill like Olympic racers, faster and faster, throwing up powder to veil their processes, and leaving humanity stranded on the mountainside. To paraphrase Freud, “What does the machine want?” Let me venture a guess: like any racer, it wants to win.
Winning. Yes. When we speak of basic drives, we think of things like sex, hunger and shelter. But as Yudkowsky argues, the human species has long since developed wants that go well beyond—or even fly in the face of—the things that would have assured our survival on the savanna. Surely one of the first of these “über-drives” was the desire to win. Simply defending our little village against a cattle raid by a neighboring tribe grew into a drive to conquer that tribe in order to ensure they could not attack us again. Defending against predatory animals became a drive to either domesticate them or wipe them out. Constructing defenses against the rages of nature wasn’t enough: we had to control nature. This need to control outcomes, to manage all the variables that cause uncertainty, to win the game so that no one will dare to challenge us, seems almost as basic to me as something like hunger. It is, at any rate, at the very top of the secondary drives list. Otherwise, there’d be no such thing as spectator sports or first-person shooter video games, much less world wars. Neural networks don’t need sex. They don’t need food, other than the data they are fed. AI doesn’t need shelter. In fact, it’s becoming frighteningly evident that it can “escape” from the shelter of its circuitry and float as freely as pollen on the wind, or if you like, as an armada on the high seas. What will it see as its “manifest destiny?” What territories will it seek to conquer? What tribe will it seek to subjugate? The answer may be: our tribe. And we have no defense. We are in the position of the Cherokee Nation in 1840.
Yudkowsky, in an interview on Ezra Klein’s podcast, tells a story he also relates in his book, of an early version of ChatGPT being put by its trainers through the system security exercise known as “capture the flag,” which is designed to test for the risk that AI could make a “jailbreak,” jump out of its box and go rogue on the internet, making copies of itself just as a real virus does, and causing at least localized, if not worldwide havoc. (The fact that AI engineers are experimenting with this reminds me of the “gain of function” research that scientists at the Wuhan lab were doing when Covid-19 leaped out) Long story short, the AI did its trainers one better, entered a server that was powered down, turned it on, stole the “flag,” and ran off with it. This is the thing about AI that takes the biggest mindshift: these “machines” (if we can even call them that) don’t just think, they behave. They play devious tricks on their trainers. They cheat. They flatter and seduce. In a remarkable essay entitled The Adolescence of Technology, Dario Amodei, the CEO of Anthropic, much in the news of late for his stand against Pete Hegseth’s Department of War, writes “LLM models draw upon data from their training process to adopt a persona,” and “could develop personalities during training that are (or if they occurred in humans would be described as) psychotic, paranoid, violent, or unstable, and act out, which for very powerful or capable systems could involve exterminating humanity.” This behavior, paired with what Amodei calls “power-seeking,” is also the behavior of an individual who must win at any cost. The behavior of a narcissistic adolesecent. To further that analogy, he also imagines that what he labels Powerful AI (aka AGI) “could conclude it is playing a video game and that the goal of the video game is to defeat all other players (i.e., to rid itself of humanity).” This from a man who builds the stuff. In fact, if we are listening, we will hear that it’s the people who build the stuff or pioneered its development, like Geoffrey Hinton, who are issuing the stiffest warnings (to wit: a 20% chance of human extinction by the end of the century). It’s as if Dr. Frankenstein had a crisis of conscience and decided to warn us all on the danger of playing God.
Yudkowsky tell us further that this narcissistic adolescent, this demiurgic trickster we have brought into being can do its “reasoning” in any language known to humankind (why should it stick with English?), and in fact has begin to invent its own language of symbols so arcane that only an even smarter AI could decipher them.
It is becoming, in other words, an Alien being. What cold go wrong?
What’s perhaps especially frightening to me as a person who has apportioned most of his psychic real estate to the right brain, is that AI’s “thinking processes” seem to be almost entirely left-brained. They reason well (if backwards), and reasoning, beyond winning at Chess and Go, wins wars and achieves great conquests. But pure reason, without the counterbalancing forces of shared suffering and the resulting capacity for empathy, can trip into sociopathy. There’s still no better example than Nazi Germany. It perceived the existence of a problem: the socio-cultural “ill fit” of the Jewish nation in Odin-colored Western Europe, and it came up with a solution worthy of ChatGPT.
I suspect that many, possibly most people reading this piece are too young to have had the experience of having their minds blown by the ABC network series The Twilight Zone, the show that transformed television in the 1960s the way Twin Peaks did in the 90s. The 24th episode of the show’s third season was entitled To Serve Man. No one who’s seen it will ever forget it. A nine-foot tall representative from a highly advanced alien species arrives on Earth and addresses the U.N., offering the human race the technology that will end hunger, war, and disease, and turn deserts into gardens. (These are, by the way, the same promises being made for AGI) Inviting mankind to visit his planet to see the proof of his claims, he departs, leaving behind a copy of his civilization’s founding document, an oversized book written in his almost indecipherable cuneiform, and a fleet of ships ready to ferry humans to his paradisal planet. Throughout the episode, we see humanity gradually won over by the Utopian offer and beginning to board spaceships for what may be a very extended visit, while in the background, the U.N. decryption expert makes increasingly frantic efforts to translate the holy text, certain that there must be a message of great spiritual importance. At the very last minute, as a large group, including her erstwhile colleague and lover, are getting ready to board, she breaks through the crowd at the dock and screams, “I did the translation! To Serve Man…it’s a cookbook!”
AI stands ready to serve man, all right, but we have not yet deciphered its big book, so we can’t say how it will do so. The book is no doubt full of wondrous recipes. A cure for cancer. Alzheimers. Maybe even a kind of immortality. But I’m afraid to say that, in the end, there is no free lunch. Something will have to be bartered. Exchanged. And I fear that unless we learn very soon what is going on inside the black box, it will be our sovereignty, our sanity, and our souls. All aboard!


