OpenAI updated its safety framework—but no longer sees mass manipulation and disinformation as a critical risk

OpenAI said it will stop assessing its AI models prior to releasing them for the risk that they could persuade or manipulate people, possibly helping to swing elections or create highly effective propaganda campaigns.

The company said it would now address those risks through its terms of service, restricting the use of its AI models in political campaigns and lobbying, and monitoring how people are using the models once they are released for signs of violations.

OpenAI also said it would consider releasing AI models that it judged to be “high risk” as long as it has taken appropriate steps to reduce those dangers—and would even consider releasing a model that presented what it called “critical risk” if a rival AI lab had already released a similar model. Previously, OpenAI had said it would not release any AI model that presented more than a “medium risk.”

The changes in policy were laid out in an update to OpenAI’s “Preparedness Framework” yesterday. That framework details how the company monitors the AI models it is building for potentially catastrophic dangers—everything from the possibility the models will help someone create a biological weapon to their ability to assist hackers to the possibility that the models will self-improve and escape human control.

The policy changes split AI safety and security experts. Several took to social media to commend OpenAI for voluntarily releasing the updated framework, noting improvements such as clearer risk categories and a stronger emphasis on emerging threats like autonomous replication and safeguard evasion.

However, others voiced concerns, including Steven Adler, a former OpenAI safety researcher who criticized the fact that the updated framework no longer requires safety tests of fine-tuned models. ”OpenAI is quietly reducing its safety commitments,” he wrote on X. Still, he emphasized that he appreciated OpenAI’s efforts: “I’m overall happy to see the Preparedness Framework updated,” he said. “This was likely a lot of work, and wasn’t strictly required.”

Some critics highlighted the removal of persuasion from the dangers the Preparedness Framework addresses.

“OpenAI appears to be shifting its approach,” said Shyam Krishna, a research leader in AI policy and governance at RAND Europe. “Instead of treating persuasion as a core risk category, it may now be addressed either as a higher-level societal and regulatory issue or integrated into OpenAI’s existing guidelines on model development and usage restrictions.” It remains to be seen how this will play out in areas like politics, he added, where AI’s persuasive capabilities are “still a contested issue.”

Courtney Radsch, a senior fellow at Brookings, the Center for International Governance Innovation, and the Center for Democracy and Technology working on AI ethics went further, calling the framework in a message to Fortune “another example of the technology sector’s hubris.” She emphasized that the decision to downgrade ‘persuasion’ “ignores context – for example, persuasion may be existentially dangerous to individuals such as children or those with low AI literacy or in authoritarian states and societies.”

Oren Etzioni, former CEO of the Allen Institute for AI and founder of TrueMedia, which offers tools to fight AI-manipulated content, also expressed concern. “Downgrading deception strikes me as a mistake given the increasing persuasive power of LLMs,” he said in an email. “One has to wonder whether OpenAI is simply focused on chasing revenues with minimal regard for societal impact.”

However, one AI safety researcher not affiliated with OpenAI told Fortune that it seems reasonable to simply address any risks from disinformation or other malicious persuasion uses through OpenAI’s terms of service. The researcher, who asked to remain anonymous because he is not permitted to speak publicly without authorization from his current employer, added that persuasion/manipulation risk is difficult to evaluate in pre-deployment testing. In addition, he pointed out that this category of risk is more amorphous and ambivalent compared to other critical risks, such as the risk AI will help someone perpetrate a chemical or biological weapons attack or will help someone in a cyberattack.

It is notable that some Members of the European Parliament have also voiced concern that the latest draft of the proposed code of practice for complying with the EU AI Act also downgraded mandatory testing of AI models for the possibility that they could spread disinformation and undermine democracy to a voluntary consideration.

Studies have found AI chatbots to be highly persuasive, although this capability itself is not necessarily dangerous. Researchers at Cornell University and MIT, for instance, found that dialogues with chatbots were effective at getting people question conspiracy theories.

Another criticism of OpenAI’s updated framework centered on a line where OpenAI states: “If another frontier AI developer releases a high-risk system without comparable safeguards, we may adjust our requirements.”

“They’re basically signaling that none of what they say about AI safety is carved in stone,” said longtime OpenAI critic Gary Marcus in a LinkedIn message, who said the line forewarns a race to the bottom. “What really governs their decisions is competitive pressure—not safety. Little by little, they’ve been eroding everything they once promised. And with their proposed new social media platform, they’re signaling a shift toward becoming a for-profit surveillance company selling private data—rather than a nonprofit focused on benefiting humanity.”

Overall, it is useful that companies like OpenAI are sharing their thinking around their risk management practices openly, Miranda Bogen, director of the AI governance lab at the Center for Democracy & Technology, told Fortune in an email.

That said, she added she is concerned about moving the goalposts. “It would be a troubling trend if, just as AI systems seem to be inching up on particular risks, those risks themselves get deprioritized within the guidelines companies are setting for themselves,” she said.

She also criticized the framework’s focus on ‘frontier’ models when OpenAI and other companies have used technical definitions of that term as an excuse to not publish safety evaluations of recent, powerful models.(For example, OpenAI released its 4.1 model yesterday without a safety report, saying that it was not a frontier model). In other cases, companies have either failed to publish safety reports or been slow to do so, publishing them months after the model has been released.

“Between these sorts of issues and an emerging pattern among AI developers where new models are being launched well before or entirely without the documentation that companies themselves promised to release, it’s clear that voluntary commitments only go so far,” she said.

This story was originally featured on Fortune.com

OpenAI updated its safety framework—but no longer sees mass manipulation and disinformation as a critical risk

Recent Posts

Recent Comments

Archives

Categories

Meta