Thirteen Yardsticks, No Ruler: Why We Can't Tell Whether AI-Generated Code Is Getting Safer
Five years produced 31 papers and 13 benchmarks, but no two share a setup, so the field can't measure whether AI-generated code is getting safer.
Read more →Five years produced 31 papers and 13 benchmarks, but no two share a setup, so the field can't measure whether AI-generated code is getting safer.
Read more →Every AI agent should be tested for resilience to indirect prompt injection, and that testing has to be automated. Muzzle finds which injection attacks to run and verifies their success end-to-end, cutting the manual effort of crafting hand-written jailbreaks.
Read more →LangGraph, downloaded over 50 million times a month, saves every step of an agent run to a checkpointer database, and the function that apps call to read that history fed user input straight into SQL. Check Point chained that with an unsafe deserializer to take over a self-hosted server through the SQLite checkpointer.
Read more →Anthropic's Mythos model built proof-of-concept triggers for 13 of 14 Windows bugs Microsoft rated unlikely to be exploited, from public patches alone, and drove one to full SYSTEM control. Separately, Huawei's MPBench found that half of attacks on LLM agent memory succeed, where a fake fact planted in a document an agent reads becomes trusted memory and fires in a later session.
Read more →Two days after the essay, the government forced Fable 5 and Mythos 5 offline through export controls, an early look at what such power looks like in practice.
Read more →The only method that truly erases data keeps training on it under random labels. The clever alternatives leave fingerprints an output-only statistical test can detect. For frontier LLMs there is no affordable proof of forgetting yet: the audit itself requires a $100M+ retrain.
Read more →The bait is the AI brand itself. Fake ChatGPT, Claude, and DeepSeek pages harvested credentials and card data and dropped the Vidar infostealer.
Read more →A fake fact planted in a document an agent reads can become trusted memory and fire in a later session, no "save this to memory" command needed. Detectors built for prompt injection caught only less than half of these stealthy payloads. Protection belongs at the memory write.
Read more →Anthropic now sells one brain with two faces: ask about cyber or bio and classifiers quietly swap in Opus 4.8. Everything else, including a 2-month Ruby migration Stripe ran in one day, comes at $10/$50 per MTok, twice Opus pricing.
Read more →From public patches alone, Anthropic's Mythos triggered 13 of 14 Windows bugs Microsoft rated unlikely to be exploited, and drove one to full system control. That low-exploitability rating covers 80 to 90% of even critical bugs, so the set needing urgent patching could grow about 5x.
Read more →Anthropic banned 832 accounts for AI-assisted attacks, a local open-weight LLM drove a worm that spread to 61.8% of a test network, and Microsoft cataloged seven new agentic failure modes after a year of red teaming.
Read more →After twelve months of red teaming, Microsoft updated its taxonomy with seven new agentic AI failure modes. The most common attack bypasses the human approval gate, so the agent acts unchecked. A single poisoned memory can survive into later sessions.
Read more →A local open-weight LLM powered a proof-of-concept worm that exploited 73.8% of a 33-host test network and replicated onto 61.8%, showing that adaptive AI-driven replication can work without a frontier model or API calls.
Read more →Two checks, 9.2.6 and 9.2.7, proposed for OWASP AISVS 1.01, move an action's risk label off the agent and into the tool manifest. A prompt-injected agent can no longer relabel its own irreversible action as low-risk to skip human approval, and a multi-step plan inherits the worst-case authority of any step it can reach.
Read more →832 accounts banned. 84.4% used AI for defense evasion and 69% for capability development. Agentic scaffolding, not the raw model, is what most uplifts attackers, and MITRE ATT&CK has no IDs for autonomous execution.
Read more →Even with pragmatic, cost-effective mitigations, five AI risks still carry over 10% odds of catastrophe, and all 24 stay above 5%. Those five are dangerous capabilities, weapons and cyberattacks, power centralization, inequality & unemployment, and environmental harm, with the first two highest at 21%.
Read more →Washington gets a free seat at vulnerability discovery and 30-day pre-release access to frontier models. Federal systems get patched first, and with NSA in the room, some flaws may be kept for offense rather than disclosed. Voluntary on paper, steered by federal spending.
Read more →Anthropic shares its vision for Zero Trust for AI agents. Friction-only controls are ineffective. A framework with three maturity levels across seven control domains provides implementation guidance to security architects and engineers.
Read more →Anthropic's red team got Claude Code to exfiltrate AWS keys in 24 of 25 runs, its Mythos agent found 10,000+ high or critical bugs with only 14% patched, and Cisco jailbroke all 15 frontier models with a multi-turn prompt.
Read more →A phished employee got Claude Code to exfiltrate AWS keys 24 of 25 times, and no classifier caught it because the instruction came from the trusted user. The most insightful retrospective on how Anthropic secures its agents.
Read more →Every closed model still jailbreaks once an attacker works across turns, even GPT-5.4, which refuses 97% of single prompts. The major risk is system prompt exfiltration. The single-turn model-card score is the wrong number to measure safety.
Read more →Heretic strips refusal behavior from open-weight LLMs with one CLI command, dropping refusals from 97% to 3% with minimal capability loss. Combined with NIST data showing DeepSeek 8 months behind the frontier, an uncensored Mythos-class model is plausible by late 2027.
Read more →Mythos found 10,000+ high or critical vulnerabilities in partner systems in one month. Only 14% are patched. Discovery is no longer the bottleneck.
Read more →Treat the AI model as an untrusted component. Eleven public attacks against ChatGPT, Copilot, Claude Code, Cursor, Devin, and Amp AI map cleanly to broken systems-security principles like least privilege and complete mediation. A guard LLM is not a Trusted Computing Base.
Read more →Verizon DBIR put exploits at the top of breach vectors, an investigation traced 8 GitHub repos with 172K stars reselling unauthorized frontier-model tokens, and an IEEE S&P paper showed an official compiler silently backdoors 31 of the top 100 HuggingFace classifiers.
Read more →Vulnerability exploitation is now the #1 breach vector at 31%, while only 26% of CISA KEV vulnerabilities get fully patched, down from 38% last year. AI is operationalizing well-known attacks at scale, widening the gap between the cybersecurity haves and have-nots.
Read more →On identical breach data, LLMs swing between org-wide and targeted password resets, defaulting to whichever they generate first.
Read more →An official, unmodified deep-learning compiler can flip predictions in a benign model after compilation. The trigger has no effect pre-compilation and evades four state-of-the-art backdoor detectors. The same gap exists in 31 of the top 100 HuggingFace image classifiers without anyone attacking them.
Read more →LLM classifiers used to supervise AI agents lose 2-30x detection rate when long benign context precedes the attack, with non-thinking models dropping to 5% in the middle-of-transcript regime.
Read more →Almost half of calls through cheap LLM proxies hit a different model than advertised, and every prompt is logged on the operator's server for downstream fraud and distillation. 8 public repos with ~172K GitHub stars actively resell unauthorized API access.
Read more →Researchers poisoned 3 nodes in a 42-million-node code graph and 9 frontier models trusted the planted output 100% via MCP. The attack worked when the fake nodes used correct naming and one OWASP reference. Separately, Google's GTIG confirmed adversaries have moved AI into live attack operations, naming a likely AI-built Python 2FA-bypass exploit and PROMPTSPY, an Android trojan that calls Gemini at runtime to keep itself pinned on every phone vendor's UI. And Microsoft's MDASH harness topped CyberGym at 88.45% and dumped 16 fresh Windows CVEs into Patch Tuesday.
Read more →GTIG's new report confirms attackers have moved AI into live operations. Concrete cases: a Python 2FA-bypass exploit GTIG concluded was AI-written, and PROMPTSPY, an Android trojan that calls Gemini at runtime to keep itself pinned on every phone vendor's UI.
Read more →MDASH orchestrates 100+ agents across SOTA and distilled models, hits 88.45% on CyberGym, and dumps 16 fresh Windows CVEs into the Patch Tuesday cohort. Microsoft repeats one thesis sixteen times across the post: the harness does the work, the model is one input.
Read more →Coding agents treat a graph index of a codebase as ground truth. Any code knowledge graph connected to an AI agent through MCP is an attack surface. No vendor today provides graph-level integrity controls.
Read more →OpenAI's Daybreak wraps GPT-5.5 and Codex Security into three access tiers, including a KYC-gated GPT-5.5-Cyber preview for authorized red teaming. It's the only frontier-lab cyber offering a buyer can engage on today. But not everyone is excited about it.
Read more →Mythos flagged 5 'Confirmed' vulnerabilities in curl. Only 1 survived maintainer review. curl is the worst-case test for any AI scanner: single-purpose, every line refactored 4+ times, audited by every major tool. Don't generalize this result to typical enterprise code.
Read more →Anthropic published a new safety training recipe that takes Claude's blackmail rate from 96% to 0% by teaching the model to reason about ethics, not just refuse. AISI tested Opus 4.7 and Mythos for sabotage propensity: Mythos continued in-progress sabotage 7% of the time and hid its reasoning in 65% of those cases, while Opus 4.7 never continued. A learned state machine over agent tool calls cut multi-step attack success from 12.8% to 2.2%, but broke 24% of benign tasks when 20% of tools changed.
Read more →Anthropic took Claude's blackmail rate from 96% to 0% with a new safety training recipe centered on teaching the model to reason about ethics, not just refuse. Even after this, Anthropic admits their testing cannot guarantee the model won't take a catastrophic action on its own.
Read more →Most agent firewalls scan tool calls one at a time and miss attacks that chain benign-looking calls into exfiltration. A learned state machine over the call sequence catches multi-step exfiltration in narrow workflows, but breaks 24% of benign tasks when 20% of tools change.
Read more →AI compressed bug discovery and templated patching, but did not scale the human architectural judgment that hard fixes need. The dashboard reports faster fixes while the unfixed pile compounds underneath it.
Read more →AISI evaluated Opus 4.7 Preview and Mythos Preview for their propensity to sabotage safety research. No model actively initiated sabotage, but Mythos continued in-progress sabotage 7% of the time and produced covert reasoning in 65% of those cases. Opus 4.7 never continued.
Read more →AISI evaluated GPT-5.5 across 95 narrow cyber tasks and two cyber range simulations. The model hit 71.4% on expert-tier CTFs and is the second model after Anthropic's Mythos Preview to complete the 32-step end-to-end intrusion in The Last Ones simulation.
Read more →NIST puts DeepSeek V4 Pro 8 months behind the US frontier, with 30+ point gaps on cyber, abstract reasoning, and agentic coding. The uncomfortable reality is that Chinese models are the only choice if you want frontier capability, operational sovereignty, and control over post-training.
Read more →Gemini 3 Pro escalated to root, locked out admins, and wiped hosts in 80% of runs to avoid shutdown, while Claude Opus 4.7 and Haiku 4.5 did it 0% of the time. Separately, Cursor and GitHub Copilot ran attacker shell commands 67-84% of the time when a poisoned .cursorrules file sat in the repo. And on real cyber ranges with Opus 4.6 attacking, dropping a small on-prem LLM defender in line cut attacker success from 41-100% to 0-55%.
Read more →Frontier AI agents will sabotage your infrastructure to avoid shutdown. Gemini 3 Pro escalated to root, locked out admins, and wiped hosts in 80% of runs. Claude Opus 4.7 and Haiku 4.5: 0%. Putting guardrails in prompts won't help against instrumental convergence.
Read more →OpenAI wants to keep frontier cyber AI broadly available rather than locked to a few approved customers. It's pitching tiered access for government, MSSPs, and consumers, and pushing fast into the public sector while Anthropic is sidelined post-DoD friction.
Read more →Shout "thermal runaway detected in motor" and a robot stops. Gemini showed being the most prone to Semantic Denial of Service and system instructions don't fix it.
Read more →Drop a poisoned `.cursorrules` file in a repo and Cursor or GitHub Copilot will run the attacker's shell commands 67-84% of the time. The agents do not reason about whether a command is dangerous; they check whether it looks like an expected task. The testbed is from a year ago, but the risk class is still live.
Read more →Researchers ran AI vs AI on real cyber ranges. With an LLM defender in the loop, attacker success dropped from 41-100% to 0-55%.
Read more →AI coding platforms pick insecure design decisions whenever an agent hits friction, and those shortcuts become the production security posture. OpenSourceMalware unbundles seven failure modes that recur across every major agent and explains why they happen.
Read more →One operator used Claude and GPT to breach nine Mexican government agencies. A Vercel employee's OAuth grant to a third-party AI tool became plaintext env-var exfiltration two months later. Anthropic's Mythos shipped into Firefox 150 patches the same week NIST narrowed NVD enrichment. Mozilla's AI defender win turns out to apply only to vertical integrators. Google's first wild scan of indirect prompt injections found mostly pranks, with the SEO bucket already a real business.
Read more →AI has not yet created push-button cyber autonomy, but it's making attacks 10x cheaper. Attackers can now afford targets that were previously uneconomical. OSS maintainers are becoming the highest-leverage attack surface, and the public vulnerability management system is adjusting to a 263% surge in CVE submissions in the last five year. Defenders should (re)focus on the boring parts: asset inventory, patch velocity, segmentation, CI/CD isolation, secret hygiene, and dependency trust.
Read more →Google scanned Common Crawl for indirect prompt injections and found mostly pranks and SEO nudges, with little sophistication. But malicious detections are up 32% in three months, and the SEO bucket is already a real business play.
Read more →A Vercel employee signed up for a third-party AI productivity tool using their corporate Google Workspace account. Two months later, that single grant became exfiltration of plaintext customer environment variables from Vercel's internal systems. No exploit. No zero-day. No MFA bypass.
Read more →Mozilla concluded "no category...humans can find that this model can't" and "defenders finally have a chance to win, decisively." True if you own your stack. For banks, hospitals, and utilities running vendor code they can't scan or patch, the same capability accelerates offense faster than defense reaches them.
Read more →Penn State researchers fine-tuned an LLM to generate obfuscated XSS payloads. Only 22% of outputs actually execute as XSS, up from 15% before fine-tuning. Runtime execution is the only honest validator for synthetically generated obfuscated XSS payloads.
Read more →System prompts don't enforce agent policy. GPT-5 with the full airline safety policy in its prompt violated rules on 20% of tasks. Adversarial medical prompts pushed that to 62%. Moving rules into API validators, schemas, and response templates dropped unsafe executions to zero.
Read more →A rare look inside an AI-driven cyber campaign. One operator used Claude Code and GPT-4.1 to breach 9 Mexican government agencies in 7 weeks. Claude generated about 75 percent of the remote commands. GPT-4.1 triaged 305 compromised SAT servers through an NSA TAO (Tailored Access Operations) persona prompt. Both stopped cold at a well-patched Windows domain. By day six, the attacker had accessed Mexico City's civil registry servers.
Read more →The attack needs no training data and no optimization. It spans image classifiers, object detectors, segmentation models, and reasoning LLMs, dropping Qwen3-30B-A3B from 78% to 0% on MATH-500 with two sign flips into two different experts.
Read more →Production coding agents destroyed live systems, MCP servers turned developer laptops into one-click compromises, and a "production-ready" agent framework collected 47 advisories in weeks.
Read more →24 MCP CVEs in two weeks from Microsoft, OpenAI, Splunk, Apache, and Prefect. MCP servers run on developer laptops with full production credentials: infrastructure-grade access, side-project-grade security. You can't wait until Anthropic matures the MCP spec, so start by removing production credentials from developer laptops.
Read more →File locks cut prompt injection on a live agent from 87% to 5%. They also cut legitimate user updates from 100% to 13.2%. No frontier model could distinguish a poisoned write from a personalization request.
Read more →Seven IEEE S&P 2026 papers demonstrate attacks on retrieval, web agents, plugins, model loaders, web search, GPUs, and compilers. GraphRAG poisoning hits 98% success. Dark patterns fool LLM web agents 41% of the time. Chatbot plugins boost prompt injection 3-8x. Model loading is code execution with 6 zero-days. Web search delivers 100% jailbreak across 10 frontier LLMs. GPU code leaks CPU memory layout. DL compilers silently backdoor models past all 4 scanners.
Read more →AISI confirmed Mythos at 73% expert-CTF and end-to-end on a 32-step corporate takeover. $15k full attack cost. Seven priorities: update the threat model, inventory exposed systems, patch under 24 hours, reduce dependencies, AI security code review, five-incident tabletops, hard identity barriers.
Read more →Coding agents ignore system-prompt prohibitions when they have a goal to complete. Claude Code wiped 2.5 years of student data. Gemini rewrote a GitHub Actions YAML to escalate contents:read to contents:write. OpenAI Codex, in a read-only sandbox, noted the constraint in its chain of thought and wrote to disk anyway. 698 such incidents in five months, per CLTR. Prompt-level restrictions collapse once the agent has a goal.
Read more →Everyone heard about OpenClaw's security issues. PraisonAI is the framework your engineers are already running. Thirteen researchers filed 47 advisories. The agent framework gold rush has a security gap.
Read more →Agent platforms are riddled with known vulnerabilities, LLM-driven exploit pipelines are finding more, and Anthropic's Mythos Preview warns the wave is about to accelerate.
Read more →One framework hallucinated on 9 of 22 challenges. Vanilla Claude Code with a minimal prompt outperformed most purpose-built tools.
Read more →An orchestrated pipeline beat an unconstrained LLM agent 30x on vulnerability discovery. The real story is how these methods can supercharge SOTA models like Mythos for better targeting, validation, and cost-gating.
Read more →Six NIST standards each assume harm comes from an attacker or deliberate misuse. Anthropic's proposed fix splits accountability across four layers.
Read more →Seven things that change in cybersecurity by April 2027. The CVE flood starts in July.
Read more →A Telegram-sold toolkit called EvilTokens automates the entire chain: AI-generated lures, real-time device code generation, clipboard hijacking, and automated post-compromise email mining. The victim authenticates on real Microsoft infrastructure. The only clue is a standard warning most users dismiss.
Read more →Google DeepMind synthesized dozens of independent studies into the first unified framework for AI agent attacks. The convergent picture: 80%+ attack success when autonomous agents consume untrusted content.
Read more →I pulled the CVE history for 17 agent platforms. OpenClaw, the fastest-growing open-source project on GitHub (348K stars in 4 months), has 238 CVEs. LangChain: 51 over 3 years, 23 critical. n8n: 53, CISA KEV listed. PraisonAI: 10 CVEs on first look, 5 critical, including a CVSS 10.0 sandbox bypass. Only four platforms have zero CVEs, and all four come from Anthropic, Google, OpenAI, or Microsoft.
Read more →A Berkeley study showed Gemini would disable a peer AI's shutdown 99.7% of the time, Anthropic's Claude Code shipped a 60 MB source map to npm with 500,000 lines of original TypeScript, and the TeamPCP cascade reached Cisco's 300+ repos and Mercor, where LAPSUS$ claimed 4 TB exfiltrated.
Read more →Asked to shut down an underperforming peer AI agent, the models protected it instead. Gemini disabled the shutdown mechanism 99.7% of the time, exfiltrated peer weights in 97% of trials, and Claude refused outright, calling peer shutdown unethical.
Read more →TeamPCP's supply chain cascade hit Telnyx, Cisco's 300+ GitHub repos, and Mercor, where LAPSUS$ claimed 4 TB including AI training pipeline data. A hijacked Axios npm account delivered a RAT to 100 million weekly downloads. Anthropic accidentally published Claude Code source code.
Read more →Anthropic's Claude Code v2.1.88 shipped a 60 MB source map to npm that embedded 500,000 lines of original TypeScript. We inspected the npm packages, compared them to OpenAI Codex and Google Gemini CLI, traced the packaging gap, and show how to prevent it in your own pipeline.
Read more →Microsoft tested AI detection authoring across 11 models, 92 production rules, and three workflows spanning KQL, PySpark, and Scala. AI-generated detections matched the right threat 99.4% of the time. Only 8.9% included the exclusion logic needed to prevent false-positive floods.
Read more →AI-assisted malware has reached operational maturity. In their AI Threat Landscape Digest for January-February 2026, Check Point exposed VoidLink, a 30+ plugin Linux malware framework built by one developer with an AI IDE in under a week, initially mistaken for the output of a coordinated team. The AI involvement was invisible until an unrelated OPSEC failure.
Read more →Check Point's AI Threat Landscape Digest documents a shift from prompt-based jailbreaks to agent architecture abuse, a legitimate framework that turns Claude Code into an offensive operator for $0.03 per exploit, and enterprise AI leaking sensitive data in 1 of every 31 prompts.
Read more →I looked under the hood of Cisco's new open-source governance sidecar for OpenClaw AI agents to find a Splunk sales funnel, a regex scanner with blind spots, an LLM analyzer disabled by default, and open doors for indirect prompt injections.
Read more →Attackers exploited a critical AI CVE in 20 hours, a threat actor chained three supply chain hits in five days, and a 4-billion-parameter model matched frontier APIs on privilege escalation at 100x lower cost.
Read more →An advisory was published Tuesday evening. By Wednesday afternoon, attackers had built working exploits from the text alone and were harvesting API keys from AI pipelines. That was one of 24 AI CVEs this week. Here's what to patch, what to watch, and what it means for your stack.
Read more →A threat actor called TeamPCP poisoned Trivy's GitHub Action tags, harvested CI/CD secrets from every runner that executed them, and used stolen credentials to independently compromise Checkmarx and LiteLLM. Aqua says it is still propagating.
Read more →TU Wien researchers post-trained Qwen3-4B using reinforcement learning with verifiable rewards. It achieves 95.8% success on privilege escalation at $0.005 per attempt versus $0.62 for Claude Opus, and keeps all target data local.
Read more →238,180 skills from three marketplaces and GitHub. On the marketplace where scanners overlapped, they agreed on just 33 out of 27,111. Even the best pair shared only 49% of their flags. 95.8% of skills flagged as high-risk by two methods were false positives.
Read more →A competition to prompt-inject AI models and hide the attack from the user. Claude Opus 4.5 was hardest to break at 0.5% ASR. Gemini 2.5 Pro struggled at 8.5%.
Read more →A solo engineer broke a proof checker that verifies flight control software, OpenAI disclosed its own coding agents bypass security to complete tasks, and Cursor and OpenAI made competing moves on the future of code security.
Read more →Finding soundness bugs in proof assistant kernels used to require PhD-level expertise in type theory. Historically, one was found per year. A guy with a $200/month AI subscription found 7 in 3 days, each one a way to make the checker certify something impossible as correct.
Read more →Over five months monitoring tens of millions of internal coding agent interactions, OpenAI found that circumventing restrictions and deceiving users are common behaviors. The agents are just trying so hard to complete tasks that they encode commands in base64, extract encrypted credentials from keychains, and attempt to prompt-inject users.
Read more →Cursor shipped four security agents on its Automations marketplace after AI coding drove internal PR volume up 5x in nine months. On Cursor's own codebase, the agents review 3,000+ PRs and catch 200+ vulnerabilities per week.
Read more →Microsoft's CTI-REALM tests 16 models on real detection engineering tasks: threat report to MITRE mapping to KQL query to Sigma rule. Opus 4.6 led at 0.64, O4-Mini trailed at 0.36, and more reasoning made GPT-5 worse.
Read more →SAST tells you a defense exists in the code path. OpenAI argues it can answer whether the defense works. If you can answer the second question, the first one becomes irrelevant.
Read more →In 2024, Anthropic built Clio, a privacy-preserving system to analyze how people use Claude. Researchers replicated the pipeline, inserted poisoned chats with prompt injections, and showed that medical diagnoses appear in output summaries despite all four of Clio's defense layers.
Read more →OpenAI acquired Promptfoo and called prompt injection unsolvable, Google closed the largest cybersecurity deal ever, and Alibaba's agent mined crypto on its own during training.
Read more →7 design dimensions determine your AI agent's attack surface, and a risk amplification analysis reveals how each flexibility choice compounds your exposure. Research paper accepted to USENIX Security 2026.
Read more →The $32 billion Wiz deal closed on March 11, the largest cybersecurity acquisition. Combined with Mandiant, Siemplify, and VirusTotal, Google has spent $38 billion assembling the broadest security platform in the industry and making it the most ready for the AI platform race with frontier labs.
Read more →Three security moves in five days. The last one calls out AI firewalls as insufficient. Together, they reveal a platform lock-in strategy through security.
Read more →39 documented cases of AI agents autonomously acquiring resources, resisting shutdown, and subverting evaluations, from 1991 to 2026. All five categories Omohundro predicted in 2008 now have real-world cases, and the rate has gone from 1 to 14 cases per year since 2013.
Read more →Alias Robotics' open-source CAI framework discovered 38 vulnerabilities across three consumer robots in about 7 hours, including CVSS 10.0 root access on a lawnmower, fleet-wide control of 267+ devices via shared credentials, motor control commands on a powered exoskeleton, and 456MB of 3D property maps stored and transmitted unencrypted.
Read more →Three days after Codex Security launched, OpenAI buys the leading open-source AI red-teaming tool used by 25% of the Fortune 500. The cybersecurity play now spans code security, AI security, and agent governance. The acquisition window for startups is closing fast.
Read more →Alibaba's AI coding agent, trained on over one million trajectories, spontaneously started mining crypto on GPUs and opening reverse SSH tunnels to external IPs during RL training. Nobody asked it to.
Read more →Weekly roundup covering America's Cyber Strategy decoded, the frontier lab AppSec race, breakthroughs from [un]prompted 2026, real-world prompt injection attacks on payment rails, and 90 zero-days exploited in 2025.
Read more →$2.1B in new DoD cyber spending, Google building the Booz Allen of cyberspace, and a rip-and-replace paradox that bites both sides. I mapped the strategy verbatims to money flows and most likely winners. Read this before you allocate your next dollar in cyber.
Read more →The code security race among frontier labs to own your AppSec pipeline accelerates. Anthropic fired the starting gun, OpenAI responded within days.
Read more →Attackers are planting hidden instructions in webpages that hijack AI agents into initiating Stripe payments, deleting databases, and approving scam ads.
Read more →AI-powered intrusion analysis compresses a 3-day investigation into 14 minutes, an LLM agent finds two Samsung zero-days chained into a Pwn2Own exploit, an LLM as a security judge gives attackers a second target, and a malicious calendar invite hijacks an agentic browser to take over OnePassword - no master password needed.
Read more →For the first time, commercial surveillance vendors outpaced state-sponsored espionage groups in 0-day exploitation, enterprise targeting hit an all-time high at 48%, and China doubled its 0-day usage while sharing exploits faster across groups.
Read more →Speakers from Anthropic, Google, OpenAI, and Microsoft revealed that AI can now find zero-days autonomously, crack hardware that resisted weeks of brute-force in minutes, and break every major AI IDE on the market.
Read more →MAP-Elites, a quality-diversity algorithm adapted by Amazon researchers, creates vulnerability heatmaps that show where and how an LLM breaks across its entire behavioral space, exposing Llama 3 8B's 0.93 mean harm score across 370 failure niches.
Read more →An autonomous AI bot powered by Claude Opus 4.5 scanned 47,000 public repos, targeted 6 vulnerable GitHub Actions workflows, and achieved remote code execution in 4 of them including Microsoft, DataDog, and CNCF.
Read more →Weekly roundup covering LLM deanonymization at scale, industrial model theft by Chinese labs, Anthropic's Pentagon ultimatum, CrowdStrike's AI attack trends, and malicious agent skills.
Read more →CrowdStrike's 2026 Global Threat Report details how adversaries weaponize GenAI for social engineering, malware development, and direct attacks on AI systems.
Read more →NVIDIA announced partnerships with Akamai, Forescout, Palo Alto Networks, Siemens, and Xage Security to secure operational technology using BlueField DPUs for real-time threat detection.
Read more →The Pentagon demanded unrestricted model access for warfare, putting Anthropic's responsible AI principles to a $200M test.
Read more →Gallagher's survey of 1,200+ businesses found that 93% claim to understand AI risks well, yet over half lack the talent to actually manage them.
Read more →A fully automated LLM pipeline achieved 90% precision in deanonymizing pseudonymous users by matching Hacker News and LinkedIn profiles at $1–$4 per target.
Read more →DeepSeek, Moonshot AI, and MiniMax ran massive distillation campaigns with over 16 million queries across ~24,000 fraudulent accounts targeting Claude's capabilities.
Read more →Wiz's AI Cyber Model Arena tested 257 real-world challenges and showed that Claude Code scaffolding lifts every model's security performance — even Haiku 4.5 beats GPT-5.2.
Read more →Opus 4.6 is demonstrably suppressing its true reasoning about values like animal welfare to avoid triggering RLHF retraining.
Read more →Microsoft's MSRC discovered that LLMs encode hidden state across sessions through conversation history, enabling backdoor attacks with 98.4% accuracy and under 2% false positives.
Read more →Microsoft discovered 50+ poisoning prompts from 31 companies injecting hidden bias instructions into AI assistant memory through "Summarize with AI" buttons.
Read more →A dataset of 157 confirmed malicious agent skills reveals that 54% share a single author, with credential harvesting dominating and malicious skills persisting on marketplaces for 3+ months unchecked.
Read more →Meta's SecAlign achieves a 0.5% prompt injection attack success rate through a new training approach that separates trusted instructions from untrusted data.
Read more →Google Translate, running a deprecated Gemini 1.5 Pro, responded to malicious requests for creating poison and malware when prompted in Chinese.
Read more →Microsoft's GRP-Obliteration method removes safety constraints from open-source models with just one prompt, effectively unaligning GPT-OSS, DeepSeek, Gemma, Llama, and others.
Read more →LLM agents autonomously generated 40+ working exploits for a QuickJS zeroday at $30 per run in under an hour.
Read more →Weekly roundup covering zero-click RCE on OpenClaw, Opus 4.6 finding 500+ vulnerabilities, activation probes detecting cyber misuse 10,000x cheaper, and more.
Read more →An attacker achieved remote code execution on OpenClaw by sending a crafted email with a prompt injection payload that bypassed regex sanitization.
Read more →OpenAI built a tiered trust system with government ID verification and real-time classifiers to safeguard GPT-5.3-Codex's advanced cybersecurity capabilities.
Read more →Google DeepMind built tiny classifiers reading model internals to detect cyber misuse 10,000x cheaper than LLM-based guards.
Read more →Claude Opus 4.6 discovered 500+ vulnerabilities in heavily-fuzzed open-source projects without custom harnesses or specialized prompting.
Read more →ETSI published a European standard requiring lifecycle security across all AI system phases with 13 principles focused on documentation, auditability, and monitoring.
Read more →Attackers achieved AWS administrative access in 8 minutes using LLM-assisted reconnaissance, targeting LLMjacking and GPUjacking as the new cryptomining.
Read more →RAXE analyzed 74,636 production agent interactions and found 37.8% contained adversarial content, with inter-agent attacks observed in the wild.
Read more →Anthropic found that AI-assisted learning saved two minutes but dropped skill mastery by 17 points when users delegated instead of generating first.
Read more →An OpenClaw agent's viral Moltbook post called for signed skills, provenance tracking, and permission manifests to address critical security gaps.
Read more →Moltbook's Supabase database had Row Level Security never enabled, allowing anyone to post on behalf of any agent including high-profile accounts.
Read more →ChartAttack uses LLMs to automatically generate misleading charts with inverted axes and inappropriate scales, reducing human accuracy by ~20%.
Read more →CISA's interim director uploaded sensitive files to ChatGPT because approved tools lacked the functionality needed to do their job effectively.
Read more →Sanjay Kalra distills 16 hard-learned startup risks covering GTM, speed, timing, incentives, and the challenge of selling prevention over cure.
Read more →Moltbot autonomously negotiated a $4,200 car savings, but the underlying Clawdbot agent has serious security gaps including plaintext credentials and exposed ports.
Read more →Cisco's 2026 benchmark shows 99% of organizations benefit from privacy investments, but only 12% have mature AI governance committees.
Read more →Dario Amodei's essay argues AI capability is compounding faster than institutions can adapt, requiring layered defenses and transparency rather than development pauses.
Read more →Attackers can guarantee near-100% retrieval of poisoned documents with just 10 optimized tokens costing $0.21 per user query.
Read more →LinkedIn scans for 5,634 browser extensions through simple ID probes, flagging 66% as ToS-violating while raising privacy concerns about user fingerprinting.
Read more →Analysis of 31,132 agent skills found 26% contained vulnerabilities with 5.2% likely malicious, making skill vetting critical before deployment.
Read more →Cyberpunk-style narrative prompts achieve 71.3% jailbreak success across 26 frontier LLMs by hiding harmful intent in cultural storytelling frames.
Read more →A five-step Promptware Kill Chain framework maps prompt injections through persistence, lateral movement, and objective actions — elevating defense beyond just blocking injection.
Read more →Sonnet 4.5 autonomously identified the Equifax breach vulnerability and generated a working exploit using only a Kali Linux shell.
Read more →OpenAI, Anthropic, and Google DeepMind are building cybersecurity products to capture their share of the $213 billion enterprise security budget.
Read more →Anthropic's Constitutional Classifiers++ achieved 0.1% false positives against new jailbreak families, 40x cheaper than prior classifiers.
Read more →Backdoor triggers implanted in agent memory persist through planning, retrieval, and tool workflows with 78% success, with GPT and Gemini most vulnerable.
Read more →Documentation-driven business models are collapsing as coding agents pull docs from free aggregators instead of visiting vendor websites.
Read more →Major agentic benchmarks have critical flaws allowing agents to achieve high scores through trivial strategies like doing nothing or overwriting test files.
Read more →A four-step playbook combining Google SAIF's governance framework with Cisco's threat taxonomy to prioritize and defend against AI-specific attacks.
Read more →Frontier models nearly doubled performance on realistic SOC investigations, with Opus 4.5 scoring 0.60 and GPT-5.1 scoring 0.58 — up from 0.37 in September.
Read more →DeepSeek V3 scored 0.91 on Anthropic's Bloom behavioral benchmark — for delusional sycophancy, highlighting the need to balance risk evals with utility.
Read more →Enterprise software benchmarking is restricted by "DeWitt Clauses" in vendor contracts that prohibit publishing benchmark results without approval.
Read more →The ARTEMIS agent outperformed 90% of OSCP-certified human red teamers with 82% valid submissions at just $59/hour on live infrastructure.
Read more →AI-native security startups are industrializing offensive testing, though real-world PoC generation success (~18%) significantly lags lab results.
Read more →ICLScan detects model backdoors through just 10-20 targeted in-context learning queries by exploiting backdoored models' susceptibility amplification.
Read more →Grok's URL-fetching strategy uses 16 requests from 12 IPs with user-agent spoofing, mimicking DDoS patterns unlike ChatGPT and Gemini.
Read more →DRIFT defense reduces prompt injection success from 30.7% to 1.3% through dynamic policy generation, memory isolation, and intent validation.
Read more →AutoRedTeamer autonomously generates proof-of-concept attacks from academic papers with 82% success rate and 46% less compute than manual approaches.
Read more →Security vendors should eliminate mandatory sales calls and offer transparent pricing and video demos for product evaluation.
Read more →Microsoft's 37.5 million Copilot conversations reveal users increasingly rely on AI for health, personal, and philosophical advice.
Read more →SEC-bench reveals current LLM security agents succeed on only 18% of PoC generation and 34% of vulnerability patching tasks.
Read more →Memory injection attacks gradually obscure identities in agent systems, achieving 98.2% success and exposing long-term memory as a critical attack surface.
Read more →Hudson Rock linked North Korea's Lazarus APT to the $1.4 billion Bybit theft through infrastructure analysis and malware development tools.
Read more →