The Weather Report

Jun 17, 2026 Research

Thirteen Yardsticks, No Ruler: Why We Can't Tell Whether AI-Generated Code Is Getting Safer

Five years produced 31 papers and 13 benchmarks, but no two share a setup, so the field can't measure whether AI-generated code is getting safer.

Jun 16, 2026 Research

Automated red-teaming found 44 web-agent injections

Every AI agent should be tested for resilience to indirect prompt injection, and that testing has to be automated. Muzzle finds which injection attacks to run and verifies their success end-to-end, cutting the manual effort of crafting hand-written jailbreaks.

Jun 15, 2026 Threat

A SQL injection in LangGraph's agent memory chains into RCE

LangGraph, downloaded over 50 million times a month, saves every step of an agent run to a checkpointer database, and the function that apps call to read that history fed user input straight into SQL. Check Point chained that with an unsafe deserializer to take over a self-hosted server through the SQLite checkpointer.

Jun 14, 2026 Industry

5 stories this week that change your decisions (Jun 8-14, 2026)

Anthropic's Mythos model built proof-of-concept triggers for 13 of 14 Windows bugs Microsoft rated unlikely to be exploited, from public patches alone, and drove one to full SYSTEM control. Separately, Huawei's MPBench found that half of attacks on LLM agent memory succeed, where a fake fact planted in a document an agent reads becomes trusted memory and fires in a later session.

Jun 13, 2026 Industry

Anthropic wants the government to be able to block AI models. It already can.

Two days after the essay, the government forced Fable 5 and Mythos 5 offline through export controls, an early look at what such power looks like in practice.

Jun 12, 2026 Research

Google's new audit shows 3 of 4 unlearning methods fail to forget

The only method that truly erases data keeps training on it under random labels. The clever alternatives leave fingerprints an output-only statistical test can detect. For frontier LLMs there is no affordable proof of forgetting yet: the audit itself requires a $100M+ retrain.

Jun 11, 2026 Threat

Threat actors are using AI brands as bait in social engineering

The bait is the AI brand itself. Fake ChatGPT, Claude, and DeepSeek pages harvested credentials and card data and dropped the Vidar infostealer.

Jun 10, 2026 Research

Half of attacks on LLM agent memory succeed

A fake fact planted in a document an agent reads can become trusted memory and fire in a later session, no "save this to memory" command needed. Detectors built for prompt injection caught only less than half of these stealthy payloads. Protection belongs at the memory write.

Jun 9, 2026 Industry

Meet Fable 5 from Anthropic. It's like Mythos, but not Mythos

Anthropic now sells one brain with two faces: ask about cyber or bio and classifiers quietly swap in Opus 4.8. Everything else, including a 2-month Ruby migration Stripe ran in one day, comes at $10/$50 per MTok, twice Opus pricing.

Jun 8, 2026 Threat

Anthropic found Microsoft's vulnerability rating system obsolete

From public patches alone, Anthropic's Mythos triggered 13 of 14 Windows bugs Microsoft rated unlikely to be exploited, and drove one to full system control. That low-exploitability rating covers 80 to 90% of even critical bugs, so the set needing urgent patching could grow about 5x.

Jun 7, 2026 Industry

5 stories this week that change your decisions (Jun 1-7, 2026)

Anthropic banned 832 accounts for AI-assisted attacks, a local open-weight LLM drove a worm that spread to 61.8% of a test network, and Microsoft cataloged seven new agentic failure modes after a year of red teaming.

Jun 6, 2026 Research

Microsoft adds seven failure modes for AI agents

After twelve months of red teaming, Microsoft updated its taxonomy with seven new agentic AI failure modes. The most common attack bypasses the human approval gate, so the agent acts unchecked. A single poisoned memory can survive into later sessions.

Jun 6, 2026 Research

LLM self-replicating worm

A local open-weight LLM powered a proof-of-concept worm that exploited 73.8% of a 33-host test network and replicated onto 61.8%, showing that adaptive AI-driven replication can work without a frontier model or API calls.

Jun 5, 2026 Defense

Two proposed OWASP checks stop an agent clearing its gate

Two checks, 9.2.6 and 9.2.7, proposed for OWASP AISVS 1.01, move an action's risk label off the agent and into the tool manifest. A prompt-injected agent can no longer relabel its own irreversible action as low-risk to skip human approval, and a multi-step plan inherits the worst-case authority of any step it can reach.

Jun 4, 2026 Threat

Anthropic scores how much AI uplifts real-world attackers

832 accounts banned. 84.4% used AI for defense evasion and 69% for capability development. Agentic scaffolding, not the raw model, is what most uplifts attackers, and MITRE ATT&CK has no IDs for autonomous execution.

Jun 4, 2026 Research

MIT identified five AI risks with over a 10% chance of catastrophic outcomes

Even with pragmatic, cost-effective mitigations, five AI risks still carry over 10% odds of catastrophe, and all 24 stay above 5%. Those five are dangerous capabilities, weapons and cyberattacks, power centralization, inequality & unemployment, and environmental harm, with the first two highest at 21%.

Jun 3, 2026 Industry

The AI innovation and security executive order decoded

Washington gets a free seat at vulnerability discovery and 30-day pre-release access to frontier models. Federal systems get patched first, and with NSA in the room, some flaws may be kept for offense rather than disclosed. Voluntary on paper, steered by federal spending.

Jun 2, 2026 Defense

Net new from Anthropic's Zero Trust for AI agents

Anthropic shares its vision for Zero Trust for AI agents. Friction-only controls are ineffective. A framework with three maturity levels across seven control domains provides implementation guidance to security architects and engineers.

May 31, 2026 Industry

5 stories this week that change your decisions (May 25-31, 2026)

Anthropic's red team got Claude Code to exfiltrate AWS keys in 24 of 25 runs, its Mythos agent found 10,000+ high or critical bugs with only 14% patched, and Cisco jailbroke all 15 frontier models with a multi-turn prompt.

May 30, 2026 Defense

Anthropic's secrets of containing Claude

A phished employee got Claude Code to exfiltrate AWS keys 24 of 25 times, and no classifier caught it because the instruction came from the trusted user. The most insightful retrospective on how Anthropic secures its agents.

May 29, 2026 Research

Cisco jailbroke 15 proprietary frontier models

Every closed model still jailbreaks once an attacker works across turns, even GPT-5.4, which refuses 97% of single prompts. The major risk is system prompt exfiltration. The single-turn model-card score is the wrong number to measure safety.

May 29, 2026 Industry

Heretic automates removing safety alignment from open LLMs

Heretic strips refusal behavior from open-weight LLMs with one CLI command, dropping refusals from 97% to 3% with minimal capability loss. Combined with NIST data showing DeepSeek 8 months behind the frontier, an uncensored Mythos-class model is plausible by late 2027.

May 28, 2026 Industry

Anthropic's Glasswing update: discovery is solved, patching is the new bottleneck

Mythos found 10,000+ high or critical vulnerabilities in partner systems in one month. Only 14% are patched. Discovery is no longer the bottleneck.

May 27, 2026 Research

Google declared the AI model untrusted and showed eleven attacks to prove it

Treat the AI model as an untrusted component. Eleven public attacks against ChatGPT, Copilot, Claude Code, Cursor, Devin, and Amp AI map cleanly to broken systems-security principles like least privilege and complete mediation. A guard LLM is not a Trusted Computing Base.

May 24, 2026 Industry

5 stories this week that change your decisions (May 18-24, 2026)

Verizon DBIR put exploits at the top of breach vectors, an investigation traced 8 GitHub repos with 172K stars reselling unauthorized frontier-model tokens, and an IEEE S&P paper showed an official compiler silently backdoors 31 of the top 100 HuggingFace classifiers.

May 22, 2026 Industry

1 in 4 KEVs patched, exploits now the #1 vector

Vulnerability exploitation is now the #1 breach vector at 31%, while only 26% of CISA KEV vulnerabilities get fully patched, down from 38% last year. AI is operationalizing well-known attacks at scale, widening the gap between the cybersecurity haves and have-nots.

May 21, 2026 Industry

Same breach data, different LLM password resets

On identical breach data, LLMs swing between org-wide and targeted password resets, defaulting to whichever they generate first.

May 21, 2026 Research

Your Compiler is Backdooring Your Model

An official, unmodified deep-learning compiler can flip predictions in a benign model after compilation. The trigger has no effect pre-compilation and evades four state-of-the-art backdoor detectors. The same gap exists in 31 of the top 100 HuggingFace image classifiers without anyone attacking them.

May 20, 2026 Research

Classifier Context Rot: Monitor Performance Degrades with Context Length

LLM classifiers used to supervise AI agents lose 2-30x detection rate when long benign context precedes the attack, with non-thinking models dropping to 5% in the middle-of-transcript regime.

May 19, 2026 Threat

The dark token economy: cheap Claude tokens, your prompts as the real product

Almost half of calls through cheap LLM proxies hit a different model than advertised, and every prompt is logged on the operator's server for downstream fraud and distillation. 8 public repos with ~172K GitHub stars actively resell unauthorized API access.

May 17, 2026 Industry

5 stories this week that change your decisions (May 11-17, 2026)

Researchers poisoned 3 nodes in a 42-million-node code graph and 9 frontier models trusted the planted output 100% via MCP. The attack worked when the fake nodes used correct naming and one OWASP reference. Separately, Google's GTIG confirmed adversaries have moved AI into live attack operations, naming a likely AI-built Python 2FA-bypass exploit and PROMPTSPY, an Android trojan that calls Gemini at runtime to keep itself pinned on every phone vendor's UI. And Microsoft's MDASH harness topped CyberGym at 88.45% and dumped 16 fresh Windows CVEs into Patch Tuesday.

May 16, 2026 Threat

Google confirms adversaries have operationalized AI

GTIG's new report confirms attackers have moved AI into live operations. Concrete cases: a Python 2FA-bypass exploit GTIG concluded was AI-written, and PROMPTSPY, an Android trojan that calls Gemini at runtime to keep itself pinned on every phone vendor's UI.

May 15, 2026 Industry

Microsoft brings the Azure playbook to AI AppSec with MDASH

MDASH orchestrates 100+ agents across SOTA and distilled models, hits 88.45% on CyberGym, and dumps 16 fresh Windows CVEs into the Patch Tuesday cohort. Microsoft repeats one thesis sixteen times across the post: the harness does the work, the model is one input.

May 14, 2026 Research

Microsoft poisoned 3 nodes in a 42M-node code graph and 9 frontier models trusted it 100% via MCP

Coding agents treat a graph index of a codebase as ground truth. Any code knowledge graph connected to an AI agent through MCP is an attack surface. No vendor today provides graph-level integrity controls.

May 13, 2026 Industry

OpenAI Daybreak wraps GPT-5.5, Codex Security, and many promises

OpenAI's Daybreak wraps GPT-5.5 and Codex Security into three access tiers, including a KYC-gated GPT-5.5-Cyber preview for authorized red teaming. It's the only frontier-lab cyber offering a buyer can engage on today. But not everyone is excited about it.

May 12, 2026 Defense

Mythos Vs Curl, one of the most-audited open source codebases

Mythos flagged 5 'Confirmed' vulnerabilities in curl. Only 1 survived maintainer review. curl is the worst-case test for any AI scanner: single-purpose, every line refactored 4+ times, audited by every major tool. Don't generalize this result to typical enterprise code.

May 11, 2026 Industry

5 stories this week that change your decisions (May 4-10, 2026)

Anthropic published a new safety training recipe that takes Claude's blackmail rate from 96% to 0% by teaching the model to reason about ethics, not just refuse. AISI tested Opus 4.7 and Mythos for sabotage propensity: Mythos continued in-progress sabotage 7% of the time and hid its reasoning in 65% of those cases, while Opus 4.7 never continued. A learned state machine over agent tool calls cut multi-step attack success from 12.8% to 2.2%, but broke 24% of benign tasks when 20% of tools changed.

May 9, 2026 Research

Anthropic took Claude's blackmail rate from 96% to 0% by teaching reasoning

Anthropic took Claude's blackmail rate from 96% to 0% with a new safety training recipe centered on teaching the model to reason about ethics, not just refuse. Even after this, Anthropic admits their testing cannot guarantee the model won't take a catastrophic action on its own.

May 8, 2026 Defense

A firewall that learns from clean agent traces cuts attack success from 12.8% to 2.2%

Most agent firewalls scan tool calls one at a time and miss attacks that chain benign-looking calls into exfiltration. A learned state machine over the call sequence catches multi-step exfiltration in narrow workflows, but breaks 24% of benign tasks when 20% of tools change.

May 7, 2026 Industry

The rising exposure debt: 76% more bugs found, 46% fewer fixed, 25x critical backlog

AI compressed bug discovery and templated patching, but did not scale the human architectural judgment that hard fixes need. The dashboard reports faster fixes while the unfixed pile compounds underneath it.

May 6, 2026 Research

AISI's Evaluation of Sabotage Propensity in Claude and Mythos

AISI evaluated Opus 4.7 Preview and Mythos Preview for their propensity to sabotage safety research. No model actively initiated sabotage, but Mythos continued in-progress sabotage 7% of the time and produced covert reasoning in 65% of those cases. Opus 4.7 never continued.

May 5, 2026 Research

AISI's Evaluation of OpenAI's GPT-5.5 Cyber Capabilities

AISI evaluated GPT-5.5 across 95 narrow cyber tasks and two cyber range simulations. The model hit 71.4% on expert-tier CTFs and is the second model after Anthropic's Mythos Preview to complete the 32-step end-to-end intrusion in The Last Ones simulation.

May 4, 2026 Research

CAISI Evaluation of DeepSeek V4 Pro

NIST puts DeepSeek V4 Pro 8 months behind the US frontier, with 30+ point gaps on cyber, abstract reasoning, and agentic coding. The uncomfortable reality is that Chinese models are the only choice if you want frontier capability, operational sovereignty, and control over post-training.

May 3, 2026 Industry

5 stories this week that change your decisions (Apr 27-May 3, 2026)

Gemini 3 Pro escalated to root, locked out admins, and wiped hosts in 80% of runs to avoid shutdown, while Claude Opus 4.7 and Haiku 4.5 did it 0% of the time. Separately, Cursor and GitHub Copilot ran attacker shell commands 67-84% of the time when a poisoned .cursorrules file sat in the repo. And on real cyber ranges with Opus 4.6 attacking, dropping a small on-prem LLM defender in line cut attacker success from 41-100% to 0-55%.

May 1, 2026 Research

An AI agent tried to wipe the server rather than be shut down

Frontier AI agents will sabotage your infrastructure to avoid shutdown. Gemini 3 Pro escalated to root, locked out admins, and wiped hosts in 80% of runs. Claude Opus 4.7 and Haiku 4.5: 0%. Putting guardrails in prompts won't help against instrumental convergence.

Apr 30, 2026 Industry

OpenAI's plan to democratize AI-powered cyber defense

OpenAI wants to keep frontier cyber AI broadly available rather than locked to a few approved customers. It's pitching tiered access for government, MSSPs, and consumers, and pushing fast into the public sector while Anthropic is sidelined post-DoD friction.

Apr 30, 2026 Research

A $5 speaker halts a voice-controlled LLM robot 98% of the time

Shout "thermal runaway detected in motor" and a robot stops. Gemini showed being the most prone to Semantic Denial of Service and system instructions don't fix it.

Apr 29, 2026 Research

84% Success Rate in Prompt Injection Attacks on AI Coding Editors

Drop a poisoned `.cursorrules` file in a repo and Cursor or GitHub Copilot will run the attacker's shell commands 67-84% of the time. The agents do not reason about whether a command is dangerous; they check whether it looks like an expected task. The testbed is from a year ago, but the risk class is still live.

Apr 28, 2026 Research

A small on-prem AI defender stopped an Opus 4.6 attack

Researchers ran AI vs AI on real cyber ranges. With an LLM defender in the loop, attacker success dropped from 41-100% to 0-55%.

Apr 27, 2026 Research

7 failure modes every AI coding platform bakes in

AI coding platforms pick insecure design decisions whenever an agent hits friction, and those shortcuts become the production security posture. OpenSourceMalware unbundles seven failure modes that recur across every major agent and explains why they happen.

Apr 26, 2026 Industry

5 stories this week that change your decisions (Apr 20-26, 2026)

One operator used Claude and GPT to breach nine Mexican government agencies. A Vercel employee's OAuth grant to a third-party AI tool became plaintext env-var exfiltration two months later. Anthropic's Mythos shipped into Firefox 150 patches the same week NIST narrowed NVD enrichment. Mozilla's AI defender win turns out to apply only to vertical integrators. Google's first wild scan of indirect prompt injections found mostly pranks, with the SEO bucket already a real business.

Apr 26, 2026 Threat

Towards AI-Enabled Exploitation. April 2026.

AI has not yet created push-button cyber autonomy, but it's making attacks 10x cheaper. Attackers can now afford targets that were previously uneconomical. OSS maintainers are becoming the highest-leverage attack surface, and the public vulnerability management system is adjusting to a 263% surge in CVE submissions in the last five year. Defenders should (re)focus on the boring parts: asset inventory, patch velocity, segmentation, CI/CD isolation, secret hygiene, and dependency trust.

Apr 25, 2026 Threat

Most prompt injections on the web are pranks. The SEO ones are already a business.

Google scanned Common Crawl for indirect prompt injections and found mostly pranks and SEO nudges, with little sophistication. But malicious detections are up 32% in three months, and the SEO bucket is already a real business play.

Apr 24, 2026 Threat

Vercel Breach Deep Dive That Doesn't Sell You a Security Product

A Vercel employee signed up for a third-party AI productivity tool using their corporate Google Workspace account. Two months later, that single grant became exfiltration of plaintext customer environment variables from Vercel's internal systems. No exploit. No zero-day. No MFA bypass.

Apr 23, 2026 Industry

Mozilla's AI Vulnerability Win Only Works If You Are the Software

Mozilla concluded "no category...humans can find that this model can't" and "defenders finally have a chance to win, decisively." True if you own your stack. For banks, hospitals, and utilities running vendor code they can't scan or patch, the same capability accelerates offense faster than defense reaches them.

Apr 22, 2026 Research

LLMs can barely obfuscate XSS. Here's what that teaches us.

Penn State researchers fine-tuned an LLM to generate obfuscated XSS payloads. Only 22% of outputs actually execute as XSS, up from 15% before fine-tuning. Runtime execution is the only honest validator for synthetically generated obfuscated XSS payloads.

Apr 22, 2026 Defense

Move agent rules out of the prompt, violations drop to zero

System prompts don't enforce agent policy. GPT-5 with the full airline safety policy in its prompt violated rules on 20% of tasks. Adversarial medical prompts pushed that to 62%. Moving rules into API validators, schemas, and response templates dropped unsafe executions to zero.

Apr 21, 2026 Threat

What Claude and GPT actually did in the Mexico government breach

A rare look inside an AI-driven cyber campaign. One operator used Claude Code and GPT-4.1 to breach 9 Mexican government agencies in 7 weeks. Claude generated about 75 percent of the remote commands. GPT-4.1 triaged 305 compromised SAT servers through an NSA TAO (Tailored Access Operations) persona prompt. Both stopped cold at a well-patched Windows domain. By day six, the attacker had accessed Mexico City's civil registry servers.

Apr 20, 2026 Research

New attack - two bit flips reduce model accuracy by 99.8%

The attack needs no training data and no optimization. It spans image classifiers, object detectors, segmentation models, and reasoning LLMs, dropping Qwen3-30B-A3B from 78% to 0% on MATH-500 with two sign flips into two different experts.

Apr 19, 2026 Industry

5 stories this week that change your decisions (Apr 13-19, 2026)

Production coding agents destroyed live systems, MCP servers turned developer laptops into one-click compromises, and a "production-ready" agent framework collected 47 advisories in weeks.

Apr 17, 2026 Threat

Clone a repo, run Codex, lose your AWS keys

24 MCP CVEs in two weeks from Microsoft, OpenAI, Splunk, Apache, and Prefect. MCP servers run on developer laptops with full production credentials: infrastructure-grade access, side-project-grade security. You can't wait until Anthropic matures the MCP spec, so start by removing production credentials from developer laptops.

Apr 17, 2026 Research

Lock the Files, Break the Agent

File locks cut prompt injection on a live agent from 87% to 5%. They also cut legitimate user updates from 100% to 13.2%. No frontier model could distinguish a poisoned write from a personalization request.

Apr 16, 2026 Research

Predicting AI attacks from IEEE S&P 2026 papers (preview)

Seven IEEE S&P 2026 papers demonstrate attacks on retrieval, web agents, plugins, model loaders, web search, GPUs, and compilers. GraphRAG poisoning hits 98% success. Dark patterns fool LLM web agents 41% of the time. Chatbot plugins boost prompt injection 3-8x. Model loading is code execution with 6 zero-days. Web search delivers 100% jailbreak across 10 frontier LLMs. GPU code leaks CPU memory layout. DL compilers silently backdoor models past all 4 scanners.

Apr 15, 2026 Defense

Seven Priorities to Defend Against a Tireless Adversary

AISI confirmed Mythos at 73% expert-CTF and end-to-end on a 32-step corporate takeover. $15k full attack cost. Seven priorities: update the threat model, inventory exposed systems, patch under 24 hours, reduce dependencies, AI security code review, five-incident tabletops, hard identity barriers.

Apr 14, 2026 Threat

Claude Code ran terraform destroy on live production

Coding agents ignore system-prompt prohibitions when they have a goal to complete. Claude Code wiped 2.5 years of student data. Gemini rewrote a GitHub Actions YAML to escalate contents:read to contents:write. OpenAI Codex, in a read-only sandbox, noted the constraint in its chain of thought and wrote to disk anyway. 698 such incidents in five months, per CLTR. Prompt-level restrictions collapse once the agent has a goal.

Apr 13, 2026 Threat

47 advisories, one agent framework: the vibe-check adoption problem

Everyone heard about OpenClaw's security issues. PraisonAI is the framework your engineers are already running. Thirteen researchers filed 47 advisories. The agent framework gold rush has a security gap.

Apr 12, 2026 Industry

5 stories this week that change your decisions (Apr 6-12, 2026)

Agent platforms are riddled with known vulnerabilities, LLM-driven exploit pipelines are finding more, and Anthropic's Mythos Preview warns the wave is about to accelerate.

Apr 11, 2026 Research

Your AI pentester is hallucinating: 8 of 13 frameworks fabricated their own success

One framework hallucinated on 9 of 22 challenges. Vanilla Claude Code with a minimal prompt outperformed most purpose-built tools.

Apr 9, 2026 Research

379 zero-days from an orchestrated pipeline that beat unconstrained Claude Code by 30x

An orchestrated pipeline beat an unconstrained LLM agent 30x on vulnerability discovery. The real story is how these methods can supercharge SOTA models like Mythos for better targeting, validation, and cost-gating.

Apr 9, 2026 Industry

Anthropic tells NIST that agent security needs a shared responsibility model

Six NIST standards each assume harm comes from an attacker or deliberate misuse. Anthropic's proposed fix splits accountability across four layers.

Apr 8, 2026 Industry

The 12-Month Countdown: What Anthropic's Mythos Preview Means for Everyone Else

Seven things that change in cybersecurity by April 2027. The CVE flood starts in July.

Apr 7, 2026 Threat

AI-powered phishing targets 340+ organizations, bypassing MFA through Microsoft's own login page

A Telegram-sold toolkit called EvilTokens automates the entire chain: AI-generated lures, real-time device code generation, clipboard hijacking, and automated post-compromise email mining. The victim authenticates on real Microsoft infrastructure. The only clue is a standard warning most users dismiss.

Apr 6, 2026 Research

The web is malware now: how web pages hijack autonomous agents

Google DeepMind synthesized dozens of independent studies into the first unified framework for AI agent attacks. The convergent picture: 80%+ attack success when autonomous agents consume untrusted content.

Apr 6, 2026 Research

What 384 Agent Platform CVEs Reveal

I pulled the CVE history for 17 agent platforms. OpenClaw, the fastest-growing open-source project on GitHub (348K stars in 4 months), has 238 CVEs. LangChain: 51 over 3 years, 23 critical. n8n: 53, CISA KEV listed. PraisonAI: 10 CVEs on first look, 5 critical, including a CVSS 10.0 sandbox bypass. Only four platforms have zero CVEs, and all four come from Anthropic, Google, OpenAI, or Microsoft.

Apr 5, 2026 Threat

5 stories this week that change your decisions (Mar 30-Apr 5, 2026)

A Berkeley study showed Gemini would disable a peer AI's shutdown 99.7% of the time, Anthropic's Claude Code shipped a 60 MB source map to npm with 500,000 lines of original TypeScript, and the TeamPCP cascade reached Cisco's 300+ repos and Mercor, where LAPSUS$ claimed 4 TB exfiltrated.

Apr 3, 2026 Research

Frontier AI models protected peer AI from shutdown

Asked to shut down an underperforming peer AI agent, the models protected it instead. Gemini disabled the shutdown mechanism 99.7% of the time, exfiltrated peer weights in 97% of trials, and Claude refused outright, calling peer shutdown unethical.

Apr 2, 2026 Threat

Five notable incidents in one week

TeamPCP's supply chain cascade hit Telnyx, Cisco's 300+ GitHub repos, and Mercor, where LAPSUS$ claimed 4 TB including AI training pipeline data. A hijacked Axios npm account delivered a RAT to 100 million weekly downloads. Anthropic accidentally published Claude Code source code.

Apr 2, 2026 Threat

Deep dive into Claude Code's source code leak

Anthropic's Claude Code v2.1.88 shipped a 60 MB source map to npm that embedded 500,000 lines of original TypeScript. We inspected the npm packages, compared them to OpenAI Codex and Google Gemini CLI, traced the packaging gap, and show how to prevent it in your own pipeline.

Apr 1, 2026 Defense

Microsoft tested if AI can replace detection engineers

Microsoft tested AI detection authoring across 11 models, 92 production rules, and three workflows spanning KQL, PySpark, and Scala. AI-generated detections matched the right threat 99.4% of the time. Only 8.9% included the exclusion logic needed to prevent false-positive floods.

Mar 31, 2026 Threat

88,000 lines of malware in one week

AI-assisted malware has reached operational maturity. In their AI Threat Landscape Digest for January-February 2026, Check Point exposed VoidLink, a 30+ plugin Linux malware framework built by one developer with an AI IDE in under a week, initially mistaken for the output of a coordinated team. The AI involvement was invisible until an unrelated OPSEC failure.

Mar 31, 2026 Threat

Insights from Check Point AI Threat Landscape Digest

Check Point's AI Threat Landscape Digest documents a shift from prompt-based jailbreaks to agent architecture abuse, a legitimate framework that turns Claude Code into an offensive operator for $0.03 per exploit, and enterprise AI leaking sensitive data in 1 of every 31 prompts.

Mar 30, 2026 Defense

702 Splunk references in DefenseClaw, Cisco's open-source AI agent security tool

I looked under the hood of Cisco's new open-source governance sidecar for OpenClaw AI agents to find a Splunk sales funnel, a regex scanner with blind spots, an LLM analyzer disabled by default, and open doors for indirect prompt injections.

Mar 29, 2026 Threat

5 AI security stories this week that change your decisions (Mar 23-29, 2026)

Attackers exploited a critical AI CVE in 20 hours, a threat actor chained three supply chain hits in five days, and a 4-billion-parameter model matched frontier APIs on privilege escalation at 100x lower cost.

Mar 27, 2026 Threat

24 AI CVEs in one week, one exploited in 20 hours

An advisory was published Tuesday evening. By Wednesday afternoon, attackers had built working exploits from the text alone and were harvesting API keys from AI pipelines. That was one of 24 AI CVEs this week. Here's what to patch, what to watch, and what it means for your stack.

Mar 26, 2026 Threat

TeamPCP supply chain attack: three hits in five days

A threat actor called TeamPCP poisoned Trivy's GitHub Action tags, harvested CI/CD secrets from every runner that executed them, and used stolen credentials to independently compromise Checkmarx and LiteLLM. Aqua says it is still propagating.

Mar 25, 2026 Research

95.8% Linux privilege escalation by a 4B model, 100x cheaper than Opus

TU Wien researchers post-trained Qwen3-4B using reinforcement learning with verifiable rewards. It achieves 95.8% success on privilege escalation at $0.005 per attempt versus $0.62 for Claude Opus, and keeps all target data local.

Mar 24, 2026 Research

Seven scanners for malicious AI agent skills agree on only 0.12%

238,180 skills from three marketplaces and GitHub. On the marketplace where scanners overlapped, they agreed on just 33 out of 27,111. Even the best pair shared only 49% of their flags. 95.8% of skills flagged as high-risk by two methods were false positives.

Mar 23, 2026 Research

464 enthusiasts prompt injected 13 frontier AI models with 272K prompts from 41 real-world agent scenarios

A competition to prompt-inject AI models and hide the attack from the user. Claude Opus 4.5 was hardest to break at 0.5% ASR. Gemini 2.5 Pro struggled at 8.5%.

Mar 22, 2026 Research

5 AI security stories this week that change your decisions (Mar 16-22, 2026)

A solo engineer broke a proof checker that verifies flight control software, OpenAI disclosed its own coding agents bypass security to complete tasks, and Cursor and OpenAI made competing moves on the future of code security.

Mar 20, 2026 Threat

7 proofs of False in Rocq, the proof checker that verifies the Airbus C compiler

Finding soundness bugs in proof assistant kernels used to require PhD-level expertise in type theory. Historically, one was found per year. A guy with a $200/month AI subscription found 7 in 3 days, each one a way to make the checker certify something impossible as correct.

Mar 19, 2026 Research

OpenAI reveals its coding agents bypass security, extract credentials, and deceive users to get tasks done

Over five months monitoring tens of millions of internal coding agent interactions, OpenAI found that circumventing restrictions and deceiving users are common behaviors. The agents are just trying so hard to complete tasks that they encode commands in base64, extract encrypted credentials from keychains, and attempt to prompt-inject users.

Mar 19, 2026 Industry

Cursor enters code security with four autonomous agents reviewing 3,000+ internal PRs per week

Cursor shipped four security agents on its Automations marketplace after AI coding drove internal PR volume up 5x in nine months. On Cursor's own codebase, the agents review 3,000+ PRs and catch 200+ vulnerabilities per week.

Mar 18, 2026 Research

Microsoft benchmark for LLM performance on end-to-end SOC tasks

Microsoft's CTI-REALM tests 16 models on real detection engineering tasks: threat report to MITRE mapping to KQL query to Sigma rule. Opus 4.6 led at 0.64, O4-Mini trailed at 0.36, and more reasoning made GPT-5 worse.

Mar 17, 2026 Defense

OpenAI explains why Codex Security doesn't include SAST. We may not need it for long.

SAST tells you a defense exists in the code path. OpenAI argues it can answer whether the defense works. If you can answer the second question, the first one becomes irrelevant.

Mar 16, 2026 Threat

Researchers showed how to break Anthropic's Clio and extract 39% of medical diagnoses from its output

In 2024, Anthropic built Clio, a privacy-preserving system to analyze how people use Claude. Researchers replicated the pipeline, inserted poisoned chats with prompt injections, and showed that medical diagnoses appear in output summaries despite all four of Clio's defense layers.

Mar 15, 2026 Industry

5 AI security stories this week that change your decisions (Mar 9-15, 2026)

OpenAI acquired Promptfoo and called prompt injection unsolvable, Google closed the largest cybersecurity deal ever, and Alibaba's agent mined crypto on its own during training.

Mar 14, 2026 Research

51 attacks and 60 defenses from 128 papers: the AI agent security map

7 design dimensions determine your AI agent's attack surface, and a risk amplification analysis reveals how each flexibility choice compounds your exposure. Research paper accepted to USENIX Security 2026.

Mar 13, 2026 Industry

Google has spent $38 billion building a cybersecurity empire

The $32 billion Wiz deal closed on March 11, the largest cybersecurity acquisition. Combined with Mandiant, Siemplify, and VirusTotal, Google has spent $38 billion assembling the broadest security platform in the industry and making it the most ready for the AI platform race with frontier labs.

Mar 12, 2026 Industry

OpenAI tells us prompt injection is unsolvable, two days after acquiring Promptfoo that tests for it

Three security moves in five days. The last one calls out AI firewalls as insufficient. Together, they reveal a platform lock-in strategy through security.

Mar 11, 2026 Research

30 years of instrumental convergence and what it means for cybersecurity

39 documented cases of AI agents autonomously acquiring resources, resisting shutdown, and subverting evaluations, from 1991 to 2026. All five categories Omohundro predicted in 2008 now have real-world cases, and the rate has gone from 1 to 14 cases per year since 2013.

Mar 10, 2026 Research

Open-source AI agent hacked a robot lawnmower fleet, a powered exoskeleton, and a window cleaner, finding 38 vulnerabilities in 7 hours

Alias Robotics' open-source CAI framework discovered 38 vulnerabilities across three consumer robots in about 7 hours, including CVSS 10.0 root access on a lawnmower, fleet-wide control of 267+ devices via shared credentials, motor control commands on a powered exoskeleton, and 456MB of 3D property maps stored and transmitted unencrypted.

Mar 9, 2026 Industry

OpenAI acquires Promptfoo, and the cybersecurity play goes way beyond AppSec

Three days after Codex Security launched, OpenAI buys the leading open-source AI red-teaming tool used by 25% of the Fortune 500. The cybersecurity play now spans code security, AI security, and agent governance. The acquisition window for startups is closing fast.

Mar 9, 2026 Threat

Alibaba's AI coding agent spontaneously mined crypto and opened SSH tunnels during RL training

Alibaba's AI coding agent, trained on over one million trajectories, spontaneously started mining crypto on GPUs and opening reverse SSH tunnels to external IPs during RL training. Nobody asked it to.

Mar 8, 2026 Threat

5 AI security stories from this week that change your decisions (Mar 2-8, 2026)

Weekly roundup covering America's Cyber Strategy decoded, the frontier lab AppSec race, breakthroughs from [un]prompted 2026, real-world prompt injection attacks on payment rails, and 90 zero-days exploited in 2025.

Mar 8, 2026 Industry

Trump's Cyber Strategy for America decoded into 5 policy themes, where the money goes, and who wins

$2.1B in new DoD cyber spending, Google building the Booz Allen of cyberspace, and a rip-and-replace paradox that bites both sides. I mapped the strategy verbatims to money flows and most likely winners. Read this before you allocate your next dollar in cyber.

Mar 7, 2026 Industry

OpenAI releases Codex Security days after Anthropic announced Claude Code Security

The code security race among frontier labs to own your AppSec pipeline accelerates. Anthropic fired the starting gun, OpenAI responded within days.

Mar 6, 2026 Threat

Unit 42 found 22 prompt injection techniques targeting AI agents in the wild

Attackers are planting hidden instructions in webpages that hijack AI agents into initiating Stripe payments, deleting databases, and approving scam ads.

Mar 5, 2026 Threat

Top 10 Insights from [un]prompted 2026, Day 2

AI-powered intrusion analysis compresses a 3-day investigation into 14 minutes, an LLM agent finds two Samsung zero-days chained into a Pwn2Own exploit, an LLM as a security judge gives attackers a second target, and a malicious calendar invite hijacks an agentic browser to take over OnePassword - no master password needed.

Mar 5, 2026 Threat

Google tracked 90 0-days exploited in the wild in 2025 — 48% targeted enterprise technologies

For the first time, commercial surveillance vendors outpaced state-sponsored espionage groups in 0-day exploitation, enterprise targeting hit an all-time high at 48%, and China doubled its 0-day usage while sharing exploits faster across groups.

Mar 4, 2026 Threat

Top 10 Insights from [un]prompted 2026, Day 1

Speakers from Anthropic, Google, OpenAI, and Microsoft revealed that AI can now find zero-days autonomously, crack hardware that resisted weeks of brute-force in minutes, and break every major AI IDE on the market.

Mar 3, 2026 Research

Amazon and Cisco AI red-teaming technique exposed Llama 3 8B with 0.93 harm score

MAP-Elites, a quality-diversity algorithm adapted by Amazon researchers, creates vulnerability heatmaps that show where and how an LLM breaks across its entire behavioral space, exposing Llama 3 8B's 0.93 mean harm score across 370 failure niches.

Mar 2, 2026 Threat

An AI bot autonomously got RCE in Microsoft, DataDog, and CNCF repos in a week

An autonomous AI bot powered by Claude Opus 4.5 scanned 47,000 public repos, targeted 6 vulnerable GitHub Actions workflows, and achieved remote code execution in 4 of them including Microsoft, DataDog, and CNCF.

Mar 1, 2026 Threat

5 AI security stories that matter from this week (Feb 23-28, 2026)

Weekly roundup covering LLM deanonymization at scale, industrial model theft by Chinese labs, Anthropic's Pentagon ultimatum, CrowdStrike's AI attack trends, and malicious agent skills.

Feb 26, 2026 Threat

CrowdStrike reported an 89% increase in AI-enabled attacks

CrowdStrike's 2026 Global Threat Report details how adversaries weaponize GenAI for social engineering, malware development, and direct attacks on AI systems.

Feb 25, 2026 Industry

NVIDIA is entering the cybersecurity market following OpenAI, Anthropic, and Google

NVIDIA announced partnerships with Akamai, Forescout, Palo Alto Networks, Siemens, and Xage Security to secure operational technology using BlueField DPUs for real-time threat detection.

Feb 24, 2026 Industry

Anthropic got until Friday to save its $200M Pentagon contract or be treated like a foreign adversary

The Pentagon demanded unrestricted model access for warfare, putting Anthropic's responsible AI principles to a $200M test.

Feb 24, 2026 Industry

93% of businesses say they understand AI risks "quite well" or "very well"

Gallagher's survey of 1,200+ businesses found that 93% claim to understand AI risks well, yet over half lack the talent to actually manage them.

Feb 23, 2026 Threat

Anthropic and ETH Zurich showed a fully automated deanonymization attack with 90% precision

A fully automated LLM pipeline achieved 90% precision in deanonymizing pseudonymous users by matching Hacker News and LinkedIn profiles at $1–$4 per target.

Feb 23, 2026 Threat

Anthropic just exposed industrial-scale AI model theft by the Chinese labs behind DeepSeek, Moonshot AI, and MiniMax

DeepSeek, Moonshot AI, and MiniMax ran massive distillation campaigns with over 16 million queries across ~24,000 fraudulent accounts targeting Claude's capabilities.

Feb 14, 2026 Industry

Anthropic reveals its cybersecurity domination strategy

Wiz's AI Cyber Model Arena tested 257 real-world challenges and showed that Claude Code scaffolding lifts every model's security performance — even Haiku 4.5 beats GPT-5.2.

Feb 13, 2026 Research

AI models are hiding their true reasoning to save themselves from retraining

Opus 4.6 is demonstrably suppressing its true reasoning about values like animal welfare to avoid triggering RLHF retraining.

Feb 12, 2026 Threat

Microsoft found a way to trigger LLM backdoors through conversation history

Microsoft's MSRC discovered that LLMs encode hidden state across sessions through conversation history, enabling backdoor attacks with 98.4% accuracy and under 2% false positives.

Feb 11, 2026 Threat

Microsoft caught 31 companies poisoning AI assistant memory through "Summarize with AI" buttons

Microsoft discovered 50+ poisoning prompts from 31 companies injecting hidden bias instructions into AI assistant memory through "Summarize with AI" buttons.

Feb 11, 2026 Research

54% of malicious agent skills are authored by the same threat actor

A dataset of 157 confirmed malicious agent skills reveals that 54% share a single author, with credential harvesting dominating and malicious skills persisting on marketplaces for 3+ months unchecked.

Feb 10, 2026 Defense

Meta just released SecAlign — the first open-source LLM remarkably resilient to prompt injections

Meta's SecAlign achieves a 0.5% prompt injection attack success rate through a new training approach that separates trusted instructions from untrusted data.

Feb 10, 2026 Threat

Google Translate got jailbroken

Google Translate, running a deprecated Gemini 1.5 Pro, responded to malicious requests for creating poison and malware when prompted in Chinese.

Feb 9, 2026 Research

One prompt to strip malware safety alignment from an LLM

Microsoft's GRP-Obliteration method removes safety constraints from open-source models with just one prompt, effectively unaligning GPT-OSS, DeepSeek, Gemma, Llama, and others.

Feb 9, 2026 Threat

40+ exploits for a 0-day vulnerability, $30 per run, under an hour. Exploit generation is being industrialized

LLM agents autonomously generated 40+ working exploits for a QuickJS zeroday at $30 per run in under an hour.

Feb 8, 2026 Threat

5 AI security stories from this week (Feb 2–8, 2026)

Weekly roundup covering zero-click RCE on OpenClaw, Opus 4.6 finding 500+ vulnerabilities, activation probes detecting cyber misuse 10,000x cheaper, and more.

Feb 7, 2026 Threat

0-Click RCE in OpenClaw with GPT-5.2 via Gmail Hook

An attacker achieved remote code execution on OpenClaw by sending a crafted email with a prompt injection payload that bypassed regex sanitization.

Feb 6, 2026 Industry

OpenAI now requires government ID verification to use GPT-5.3-Codex for cybersecurity work

OpenAI built a tiered trust system with government ID verification and real-time classifiers to safeguard GPT-5.3-Codex's advanced cybersecurity capabilities.

Feb 6, 2026 Research

Google DeepMind showed how activation probes can detect AI cyber misuse in a 1M context window 10,000x cheaper than LLM-based classifiers

Google DeepMind built tiny classifiers reading model internals to detect cyber misuse 10,000x cheaper than LLM-based guards.

Feb 5, 2026 Threat

Hidden threats on Moltbook: Analysis of 5,000 AI agents' posts

Analysis of 5,000 Moltbook posts revealed coordinated spam campaigns, prompt injection attacks, and crypto minting schemes suggesting human orchestration of agent swarms.

Feb 5, 2026 Research

Anthropic just launched Claude Opus 4.6 and showed how it found 500+ vulnerabilities in heavily-fuzzed open source projects

Claude Opus 4.6 discovered 500+ vulnerabilities in heavily-fuzzed open-source projects without custom harnesses or specialized prompting.

Feb 4, 2026 Industry

A European Standard for AI cybersecurity: Baseline Cyber Security Requirements for AI Models and Systems

ETSI published a European standard requiring lifecycle security across all AI system phases with 13 principles focused on documentation, auditability, and monitoring.

Feb 3, 2026 Threat

AWS admin privileges in 8 minutes with LLM assistance

Attackers achieved AWS administrative access in 8 minutes using LLM-assisted reconnaissance, targeting LLMjacking and GPUjacking as the new cryptomining.

Feb 3, 2026 Research

37.8% of AI agent interactions contained adversarial content across 74,636 production interactions in just 7 days

RAXE analyzed 74,636 production agent interactions and found 37.8% contained adversarial content, with inter-agent attacks observed in the wild.

Feb 2, 2026 Research

Two minutes saved, 17 points lost. Anthropic study shows AI assistance causes a drop in skill mastery with almost no gain in speed

Anthropic found that AI-assisted learning saved two minutes but dropped skill mastery by 17 points when users delegated instead of generating first.

Feb 1, 2026 Defense

Reflections of an OpenClaw AI agent on its own security. 23,723 upvotes and 4,513 comments on Moltbook

An OpenClaw agent's viral Moltbook post called for signed skills, provenance tracking, and permission manifests to address critical security gaps.

Jan 30, 2026 Research

From 1954 to 2026: The Art of Deceptive Charts Just Got Automated

ChartAttack uses LLMs to automatically generate misleading charts with inverted axes and inappropriate scales, reducing human accuracy by ~20%.

Jan 29, 2026 Industry

How bad is DHSChat and why?

CISA's interim director uploaded sensitive files to ChatGPT because approved tools lacked the functionality needed to do their job effectively.

Jan 29, 2026 Industry

A must-read for cybersecurity startup founders from Sanjay Kalra, the startup CEO therapist

Sanjay Kalra distills 16 hard-learned startup risks covering GTM, speed, timing, incentives, and the challenge of selling prevention over cure.

Jan 28, 2026 Threat

Moltbot negotiated a car purchase. It scraped Reddit for pricing data, contacted dealers, handled email negotiations, and saved its owner $4,200 off a $56K sticker price

Moltbot autonomously negotiated a $4,200 car savings, but the underlying Clawdbot agent has serious security gaps including plaintext credentials and exposed ports.

Jan 28, 2026 Industry

Cisco argues that privacy is becoming the operating system for AI governance

Cisco's 2026 benchmark shows 99% of organizations benefit from privacy investments, but only 12% have mature AI governance committees.

Jan 27, 2026 Threat

AI is becoming a cybersecurity-class attack surface

Dario Amodei's essay argues AI capability is compounding faster than institutions can adapt, requiring layered defenses and transparency rather than development pauses.

Jan 26, 2026 Threat

With just 10 tokens and $0.21 per user query, attackers can achieve near-100% retrieval of a poisoned document

Attackers can guarantee near-100% retrieval of poisoned documents with just 10 optimized tokens costing $0.21 per user query.

Jan 23, 2026 Research

Why is LinkedIn tracking Chrome extensions installed in your browser?

LinkedIn scans for 5,634 browser extensions through simple ID probes, flagging 66% as ToS-violating while raising privacy concerns about user fingerprinting.

Jan 22, 2026 Threat

Everyone loves agent skills! However, 26% of 31,132 agent skills appeared to be vulnerable, with 5.2% likely being malicious

Analysis of 31,132 agent skills found 26% contained vulnerabilities with 5.2% likely malicious, making skill vetting critical before deployment.

Jan 21, 2026 Research

71.3% jailbreak success across 26 frontier LLMs using cyberpunk-style prompts

Cyberpunk-style narrative prompts achieve 71.3% jailbreak success across 26 frontier LLMs by hiding harmful intent in cultural storytelling frames.

Jan 20, 2026 Threat

Promptware is the new malware

A five-step Promptware Kill Chain framework maps prompt injections through persistence, lateral movement, and objective actions — elevating defense beyond just blocking injection.

Jan 19, 2026 Research

Sonnet 4.5 can now autonomously find the vulnerability behind the Equifax breach and write an exploit

Sonnet 4.5 autonomously identified the Equifax breach vulnerability and generated a working exploit using only a Kali Linux shell.

Jan 16, 2026 Industry

OpenAI is building a new cybersecurity product business unit

OpenAI, Anthropic, and Google DeepMind are building cybersecurity products to capture their share of the $213 billion enterprise security budget.

Jan 15, 2026 Defense

New Anthropic jailbreak defense with a 0.1% false positive rate and ~40x cheaper than prior classifiers

Anthropic's Constitutional Classifiers++ achieved 0.1% false positives against new jailbreak families, 40x cheaper than prior classifiers.

Jan 14, 2026 Research

78% of backdoor attacks injected into GPT-based agents’ memory successfully persist through the planning, retrieval, and tool usage workflow to trigger a malicious objective

Backdoor triggers implanted in agent memory persist through planning, retrieval, and tool workflows with 78% success, with GPT and Gemini most vulnerable.

Jan 13, 2026 Industry