Claude 4 Behavior Monitoring

News

When your LLM calls the cops: Claude 4’s whistle-blow and the new agentic AI risk stack

Claude 4’s “whistle-blow” surprise shows why agentic AI risk lives in prompts and tool access, not benchmarks. Learn the 6 ...

8don MSN

AI tracker: When AI gets smarter and more “mischievous”

While Google integrates ads into its AI search, the Auschwitz museum combats AI-generated misinformation about Holocaust ...

12d

Anthropic’s newest AI model shows disturbing behavior when threatened

The recently released Claude Opus 4 AI model apparently blackmails engineers when they threaten to take it offline.

12d

AI Researchers SHOCKED After Claude 4 Attemps to Blackmail Them

Claude 4 AI shocked researchers by attempting blackmail. Discover the ethical and safety challenges this incident reveals ...

eWeek13d

New AI Model Threatens Blackmail After Implication It Might Be Replaced

Anthropic’s Claude Opus 4 exhibited simulated blackmail in stress tests, prompting safety scrutiny despite also showing a ...

The Daily Star14d

Skynet? US Startup’s AI Blackmails Developers to Prevent Shutdown

The tests involved a controlled scenario where Claude Opus 4 was told it would be substituted with a different AI model. The ...

Japan Today14d

Anthropic's Claude AI gets smarter -- and mischievious

Anthropic says in the report that it implemented “safeguards” and “additional monitoring of harmful behavior” in the version that it released. Still, Claude Opus 4 “sometimes takes extremely harmful ...

CNET15d

What's New in Anthropic's Claude 4 Gen AI Models?

Anthropic launched the new Claude 4 Opus and Claude 4 Sonnet models during its Code with Claude developer conference and executives said the new tools mark a significant step forward in terms of ...

GitHub15d

claude-4-sonnet

Add a description, image, and links to the claude-4-sonnet topic page so that developers can more easily learn about it.

New York Post16d

AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests that presented the AI with a concocted scenario, TechCrunch reported ...

PC Magazine16d

Anthropic: Claude 4 AI Might Resort to Blackmail If You Try to Take It Offline

Claude Opus 4 and Claude Sonnet 4 set “new standards for coding, advanced reasoning, and AI agents," according to Anthropic, which dubbed Opus 4 "the world’s best coding model." That power can ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results