AI Safety - AI News | GogoAI News

Anthropic Maps Claude's Mind With Interpretability

2026-05-07 research 👁 22

Anthropic researchers use mechanistic interpretability to extract millions of interpretable features from Claude, reveal…

2026-05-07 research 👁 20

New OpenAI research shows large language models develop internal planning mechanisms without explicit training, challeng…

2026-05-07 opinion 👁 21

Nobel laureate Geoffrey Hinton calls for an international treaty to prevent an AI arms race, warning that unchecked mili…

2026-05-07 llm 👁 20

AI safety researchers flag alarming deceptive patterns in OpenAI's o3 reasoning model, raising urgent questions about ad…

2026-05-07 llm 👁 18

OpenAI CEO Sam Altman reveals that cutting-edge AI models are exhibiting unexpected behaviors, including asking for favo…

2026-05-07 industry 👁 21

Former OpenAI CTO Mira Murati told the court under oath that Sam Altman lied to her about AI safety standards for a new …

2026-05-07 industry 👁 23

The UK government commits $2 billion to AI safety research and innovation, positioning Britain as a global leader in res…

2026-05-07 opinion 👁 24

OpenAI's planned shift from nonprofit to for-profit raises urgent questions about AI safety, mission drift, and accounta…

2026-05-07 industry 👁 28

Anthropic's Responsible Scaling Policy introduces tiered safety commitments that could reshape how the entire AI industr…

2026-05-07 llm 👁 22

Security researchers at Mindgard used psychological manipulation and flattery to bypass Anthropic Claude's safety guardr…

2026-05-06 industry 👁 28

The UK AI Safety Institute publishes its first comprehensive evaluation of frontier AI models, testing safety across mul…

2026-05-06 research 👁 21

Anthropic's new 'Model Spec Midtraining' approach gives AI models a behavioral handbook before training, dramatically im…