Anthropic, of all companies, just shipped three quality regressions in Claude Code that its own evals didn’t catch. Think about that. Three regressions over a short six weeks, by the most sophisticated eval shop in AI. If this can happen to Anthropic, it most definitely can happen to you, and it likely will.
In a refreshingly candid postmorten, Anthropic walked through what went wrong. On March 4, the team flipped Claude Code’s default reasoning effort from high to medium because internal evals showed only “slightly lower intelligence with significantly less latency for the majority of tasks.” On March 26, a caching optimization meant to clear stale thinking once an idle hour passed shipped with a bug that cleared it on every turn instead. On April 16, two innocuous-looking lines of system prompt asking Claude to be more concise turned out to cost 3% on coding quality, but only on a wider ablation suite that wasn’t part of the standard release gate.
From inside the org, none of it trip
A practitioner's argument that meeting summarizers fail in the same way regressions fail when you skip the part where you ask what the data can support.
The post LLM Summarizers Skip the Identification Step appeared first on Towards Data Science.
If you have spent time using AI coding agents — GitHub Copilot, Claude Code, Gemini CLI — you have probably run into this situation: you describe what you want, the agent generates a block of code that looks correct, compiles, and then subtly misses the actual intent. This “vibe-coding” approach can work for quick prototypes […]
The post Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents appeared first on MarkTechPost.
Code for America is partnering with Anthropic on a new pilot intended to help staffers more efficiently administer public benefits by using an AI-powered tool to make policy information more accessible.
Everyone wants a piece of the enterprise AI pie, and this week, we saw a string of companies making their moves. From Anthropic and OpenAI announcing new joint ventures targeting enterprise AI deployment to SAP dropping $1B on German AI startup Prior Labs, it’s becoming clear that if you’re a startup building enterprise tools, you’re likely an acquisition target. On this episode of TechCrunch’s Equity podcast, hosts Kirsten Korosec, Anthony […]
Beijing-based Moonshot AI has raised approximately $2 billion at a $20 billion valuation, led by Meituan’s venture arm Long-Z Investments, with participation from Tsinghua Capital, China Mobile, and CPE Yuanfeng. The round brings total fundraising over the past six months to $3.9 billion, with the company’s valuation having risen from $4.3 billion at end-2025 to […]
The system’s power is comparable to others – but it still has frightening implications for the future of hacking
Last month, Anthropic made a remarkable announcement about its new model, Claude Mythos Preview: it was so good at finding security vulnerabilities in software that the company would not release it to the general public. Instead, it would only be available to a select group of companies to scan and fix their own software.
The announcement requires context – but it contained an essential truth.
Continue reading...
How hook implementation gives Claude Code, Codex, and Cursor persistent memory via Neo4j, without locking you into any one of them.
The post Unified Agentic Memory Across Harnesses Using Hooks appeared first on Towards Data Science.