Michael Nielsen published a blog post on existential risks from AI. A lot of AI risk discussion is kind of rote, so it was refreshing to hear Nielsen’s especially first principles-y perspective. Some highlights below–
On p(doom) as a “conceptual hazard”:
The term “probability of doom” began frustrating me after starting to routinely hear people at AI companies use it fatalistically, ignoring the fact that their choices can change the outcomes. “Probability of doom” is an example of a conceptual hazard – a case where merely using the concept may lead to mistakes in your thinking. Its main use seems to be as marketing: if widely-respected people say forcefully that they have a high or low probability of doom, that may cause other people to stop and consider why. But I dislike concepts which are good for marketing, but bad for understanding; they foster collective misunderstanding, and are likely to eventually lead to collective errors in action.
On the promise and peril of decentralized approaches to safety:
Many people like to talk about making ASI systems safe and aligned; quite apart from the difficulty in doing that (or even sensibly defining that) it seems it must be done for all ASI systems, ever. That seems to require an all-seeing surveillance regime, a fraught path. Perhaps such a surveillance regime can be implemented not merely by government or corporations against the populace, but in a much more omnidirectional way, a form of ambient sousveillance.
and:
Many of those people argue that the tech industry has concentrated power in an unhealthy way over the past 30 years. And that open source mitigates some of that concentration of power. This is sometimes correct, though it can fail: sometimes open source systems are co-opted or captured by large companies, and this may protect or reinforce the power of those companies. Assuming this effect could be avoided here, I certainly agree that open source approaches might well help with many important immediate concerns about the fairness and ethics of AI systems. Furthermore, addressing those concerns is an essential part of any long-term work toward alignment. Unfortunately, though, this argument breaks down completely over the longer term. In the short term, open source may help redistribute power in healthy, more equitable ways. Over the long term the problem is simply too much power available to human beings: making it more widely available won’t solve the problem, it will make it worse.
On the relationship between AI alignment and capabilities:
With all that said: practical alignment work is extremely accelerationist. If ChatGPT had behaved like Tay, AI would still be getting minor mentions on page 19 of The New York Times. These alignment techniques play a role in AI somewhat like the systems used to control when a nuclear bomb goes off. If such bombs just went off at random, no-one would build nuclear bombs, and there would be no nuclear threat to humanity. Practical alignment work makes today’s AI systems far more attractive to customers, far more usable as a platform for building other systems, far more profitable as a target for investors, and far more palatable to governments. The net result is that practical alignment work is accelerationist. There’s an extremely thoughtful essay by Paul Christiano, one of the pioneers of both RLHF and AI safety, where he addresses the question of whether he regrets working on RLHF, given the acceleration it has caused. I admire the self-reflection and integrity of the essay, but ultimately I think, like many of the commenters on the essay, that he’s only partially facing up to the fact that his work will considerably hasten ASI, including extremely dangerous systems.
Over the past decade I’ve met many AI safety people who speak as though “AI capabilities” and “AI safety/alignment” work is a dichotomy. They talk in terms of wanting to “move” capabilities researchers into alignment. But most concrete alignment work is capabilities work. It’s a false dichotomy, and another example of how a conceptual error can lead a field astray. Fortunately, many safety people now understand this, but I still sometimes see the false dichotomy misleading people, sometimes even causing systematic effects through bad funding decisions.
To play out Nielsen’s point a bit more, a drawback of alignment increasing capabilities is that it makes it harder to pause capabilities research to allow alignment research to catch up. The flipside, however, is that the incentives to increase capabilities also function as incentives to work on alignment. Or put another way, companies building and using frontier models have instrumental reasons to get better at alignment because a less reliable model is less useful. See also this post on the USV blog.
On the great oxidation event:
In fact, something at least a little like Ice-9 has occurred on Earth before. About 2.4 billion years ago the Earth’s atmosphere contained just trace amounts of oxygen. It was at that time that evolution accidentally discovered the modern photosynthetic pathway, with some cyanobacteria evolving the ability to convert CO2 and sunlight into energy and oxygen. This was terrific for those lifeforms, giving them a much improved energy source. But it was very likely catastrophic for other lifeforms, who were poisoned by the oxygen, causing a mass extinction. It must have seemed like a slow suffocation; it was much slower than the takeover by Ice-9 described by Vonnegut, but it was perhaps even more enveloping. It was perhaps the first victory by a grey goo on Earth.
On a priori and a posteori reasoning:
Suppose we had a perfect theory of everything and infinite computational power. Even given those resources, our ability to do (say) medicine would be surprisingly constrained. The reason is that we simply can’t deduce the nature of the human body (either in general, or in a specific instance) using those resources alone: it depends a tremendous amount on contingent facts: of our evolutionary history, of the origin of life, of our parents, of the environment we grew up in, and so on. Put another way: you can’t use pure theory to discover that human beings are built out of DNA, RNA, the ribosome, and so on. Those are experimental facts we need to observe in the world.
and
… with the discovery of public-key cryptography: it’s not so much engineering as the discovery of an entirely new phenomenon, in some sense lying hidden within existing ideas about computation and cryptography. Or consider the question of whether we could have predicted the existence of liquid water – and phenomena like the Navier-Stokes equations, turbulence, and so on – from the Schroedinger equation alone? Over the last 20 years the study of water from first principles has been a surprisingly active area of research in physics. And yet, as far as I know, we are not yet at the point where we can even deduce the Navier-Stokes equations, much less other phenomena of interest. I believe it is principally science (not engineering) when such emergent phenomena and emergent principles are being discovered.
And much more – the whole post is definitely worth reading!
Leave a comment