OpenAI has announced the formation of a new team focused on “Superalignment”, a concept related to the alignment of artificial superintelligence with human intent. The team, led by Ilya Sutskever and Jan Leike, aims to develop scientific and technical breakthroughs to control AI systems that are much smarter than humans. They are dedicating 20% of their computing resources over the next four years to this effort.
“How do we ensure AI systems much smarter than humans follow human intent?”
Question of OpenAIs superintelligence alignment.
The team believes that superintelligence, while potentially beneficial in solving many of the world’s problems, could also pose significant risks, including the disempowerment of humanity or even human extinction. Current techniques for aligning AI, such as reinforcement learning from human feedback, may not be sufficient for controlling a superintelligent AI. Therefore, they aim to build a roughly human-level automated alignment researcher and use vast amounts of computing to scale their efforts and iteratively align superintelligence.
The team’s approach includes developing a scalable training method, validating the resulting model, and stress testing their entire alignment pipeline. They also aim to understand and control how their models generalize their oversight to tasks they can’t supervise, automate the search for problematic behavior, and test their pipeline by deliberately training misaligned models.
This effort is in addition to existing work at OpenAI aimed at improving the safety of current models like ChatGPT and mitigating other risks from AI, such as misuse, economic disruption, disinformation, bias, and discrimination, among others.
The team is actively seeking outstanding new researchers and engineers to join this effort, emphasizing that superintelligence alignment is fundamentally a machine-learning problem. They plan to share the results of this effort broadly and view contributing to the alignment and safety of non-OpenAI models as an important part of their work.
The most important activity is the superintelligence alignment, which encourages interested individuals to apply for various positions such as research engineer, research scientist, and research manager at OpenAI. The team believes that this is a tractable machine-learning problem and that new contributors could make significant contributions.
According to OpenAI, we should focus much more on superintelligence rather than AGI (Artificial General Intelligence) due to its higher capability level. The team acknowledges the uncertainty over the speed of development of the technology in the coming years and thus aims for the more challenging target of aligning a much more capable system. They also note potential future challenges, such as the breakdown of assumptions like favorable generalization properties during deployment or the models’ inability to detect and undermine supervision during training.
In summary, OpenAI is making a significant commitment to addressing the challenge of superintelligence alignment, dedicating substantial resources, and inviting talented individuals to join their efforts. They aim to share their findings broadly and contribute to the safety and alignment of AI models beyond those developed by OpenAI.
































