Anthropic suggests slowing AI research until we can align it with human goals
AI could soon lead to systems capable of improving their own performance faster than humans can effectively supervise them, reviving concerns about the industry’s longstanding “alignment problem,” ensuring AI systems reliably pursue human goals, senior Anthropic researchers have warned in a new blog post titled “When AI builds itself.” Anthropic Institute lead Marina Favaro and Anthropic co-founder Jack Clark outlined three possible futures: growth in AI capabilities may flatten out; AI efficiency gains may continue to grow, but expose bottlenecks elsewhere in software development; or AI systems may become capable of full recursive self-improvement, and build their successors by themselves. It’s that third scenario that’s prompting them to suggest society be ready to hit the brakes on AI development. “How the alignment problem gets solved — or not — in this future is something we are least certain about,” they wrote. Advanced, self-improving models could follow our needs and wants — or