One of the biggest challenges that researchers are currently battling with is how to align AI models with the right human values. In the industry this simply referred to as ‘alignment’.
If you have a simple AI that predicts the weather then alignment isn’t an issue; all it does is predict weather so human values are not relevant.
However imagine you have an much more powerful AI that surpasses human intelligence, which has the capability of delivering huge change on earth. In the industry this is referred to as Artificial General Intelligence (AGI).
Let’s say we give this AI the goal to “solve world hunger”.
A misaligned AI might come up with one of the following solutions:
- Come up with a recipe for a nutritionally rich but foul tasting sludge that can be produced en masse to feed the poor
- Turn the Amazon rainforest into a massive industrial farm
- Modify human genetics to require less food
- Kill all of the hungry humans
Depending on the power of this AGI, if it actually had the ability to implement these ideas, that would be a scary prospect.
The challenge of teaching an AI to understand and come up with ideas that are aligned with human values is actually really difficult because at the end of the day, the AI is not human.
The AI could also potentially be significantly more intelligent than humans and as such could well end up seeing humans as an inferior life form, whose opinions and values don’t matter. A human to this AI could well be the equivalent of a chicken to us. Do we care about the values of a chicken? Not really, we batch farm them in horrible conditions and eat them.
AI misalignment is already causing us a lot of headaches. For example, think about the AI models that generate tailored social media feeds. These are designed to detect when a user engages with a certain piece of content. They will then show more similar content to the user, with the goal of keeping the user on the platform (primarily so they can make money out of advertising).
The issue with this is that content that engages users the most is typically divisive, heavily aligned with each user’s belief system and emotionally triggering.
The AI will pick up on this type of content, and show more of it to the user, at the expense of alternative content that provides a different viewpoint.
This creates an echo-chamber that amplifies extreme beliefs and emotions, fosters division, and provides the perfect foundation for abuse by nefarious misinformation campaigns. Sound familiar?
Yes it does, this is exactly what we have seen happening across social media for almost a decade. The result? Extreme levels of political division across the whole of West and a genuine threat to modern democracy.
If these models were well aligned with human values, they would understand the importance of providing a balanced view to their users, rather than reinforcing their often false and extreme views.
Interestingly the most advanced publicly available AI, GPT-4, is actually very well aligned. It is like an English butler. You try and ask it to do anything unethical and it will respond with a polite decline. Ask it any political questions and it will respond with a balanced answer giving multiple points of view. OpenAI worked very hard on this and, using a technique called Reinforcement Learning from Human Feedback (RLHF), they have achieved a good level of alignment. It’s heartening to see that alignment technology is catching up with these more advanced models.
The issue is that there are probably 10 or 20 more organisations hot on the heels of OpenAI that are either government driven (some more oppressive than others) or profit focused. A lot of these organisations won’t put as much effort into alignment, which will ultimately cause more problems than it solves.
To avoid AI killing us all, it needs to be well aligned!