Anthropic, of all companies, just shipped three quality regressions in Claude Code that its own evals didn’t catch. Think about that. Three regressions over a short six weeks, by the most sophisticated eval shop in AI. If this can happen to Anthropic, it most definitely can happen to you, and it likely will.
In a refreshingly candid postmorten, Anthropic walked through what went wrong. On March 4, the team flipped Claude Code’s default reasoning effort from high to medium because internal evals showed only “slightly lower intelligence with significantly less latency for the majority of tasks.” On March 26, a caching optimization meant to clear stale thinking once an idle hour passed shipped with a bug that cleared it on every turn instead. On April 16, two innocuous-looking lines of system prompt asking Claude to be more concise turned out to cost 3% on coding quality, but only on a wider ablation suite that wasn’t part of the standard release gate.
From inside the org, none of it trip
A practitioner's argument that meeting summarizers fail in the same way regressions fail when you skip the part where you ask what the data can support.
The post LLM Summarizers Skip the Identification Step appeared first on Towards Data Science.
AI Library, an outcome-based software delivery startup founded in 2023 by Arani Chaudhuri, has raised $560,000 in pre-seed funding at a $7.5 million valuation cap to accelerate its AI agent-driven approach to enterprise software deployment. The company’s platform automates the software delivery lifecycle using AI agents with human oversight, targeting enterprise functions including finance, operations, […]
If you have spent time using AI coding agents — GitHub Copilot, Claude Code, Gemini CLI — you have probably run into this situation: you describe what you want, the agent generates a block of code that looks correct, compiles, and then subtly misses the actual intent. This “vibe-coding” approach can work for quick prototypes […]
The post Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents appeared first on MarkTechPost.
I counted at least 10 events in San Francisco last night aimed at matching AI startups with VCs. Just another Thursday.
But what made Camp AI’s “Agents at Work” event (hosted by Auth0) stand out was its showcase of companies that are in various stages of reorganizing their engineering processes around AI agents. Browserbase, Mastra, Fireworks AI, Drata, Mya, MindFort, and Corridor are all part of the vendor ecosystem trying to enable secure and performant agentic AI, but the most revelatory stories were their own successes and the challenges they faced restructuring their engineering orgs for agents.
Agentic AI is reshaping team structures
Paul Klein IV, founder and CEO of Browserbase, delivered the night’s most memorable line while discussing the speed of AI adoption inside engineering teams. “If AI is not doing your whole job it’s a skill issue at this point,” said Klein.
Abhi Aiyer, founder and CTO of Mastra, said the result is dramatically smaller teams capable of executing much l
Code for America is partnering with Anthropic on a new pilot intended to help staffers more efficiently administer public benefits by using an AI-powered tool to make policy information more accessible.
Everyone wants a piece of the enterprise AI pie, and this week, we saw a string of companies making their moves. From Anthropic and OpenAI announcing new joint ventures targeting enterprise AI deployment to SAP dropping $1B on German AI startup Prior Labs, it’s becoming clear that if you’re a startup building enterprise tools, you’re likely an acquisition target. On this episode of TechCrunch’s Equity podcast, hosts Kirsten Korosec, Anthony […]
I left Google ten days ago to found my own company. It's been quite a journey figuring out how things work outside of the mothership, and I'm genuinely excited to share what I've learned from both sides of the house...