AI coding agents are making it easier than ever to produce software. Ensuring that software is secure before deployment is another matter — one that AWS thinks AI should help with too.
As enterprises adopt agentic development workflows, the volume of first-party code being created and modified is rising rapidly. Yet the process of validating vulnerabilities, determining whether they are exploitable, and fixing them often still depends on developers and security teams working through findings manually.
AWS is aiming to address that imbalance with Continuum, a new service designed to continuously discover, investigate, and remediate vulnerabilities in enterprise environments, whether the code is their own or from third parties.
Rather than simply generating alerts, the service is intended to help enterprises move findings through the entire remediation lifecycle, AWS VP of Security and Observability Chet Kapoor wrote in a blog post.
For first-party applications, Continuum can analyze cod
AI coding agents can tend to isolate research, running experiments and generating ideas that are then forgotten when context windows reset. This can waste tokens, as models then repeat the same mistakes and hit the same dead ends.
But new research argues that it’s not the model itself, but the overarching ‘tree,’ that needs tweaking. To that end, data scientists from the Gaoling School of Artificial Intelligence, Renmin University of China, and Microsoft Research have introduced Arbor, a “persistent hypothesis tree” that helps agents remember and refine learnings over long research sessions.
A long-lived coordinator manages research strategy across the tree, while short-lived executors spin up isolated worktrees to test different hypotheses. As results come back, the tree updates, narrowing and refining throughout experimentation.
In practical tests, this technique delivered more than two-fold performance gains over standard AI coding agents across real-world engineering tasks, for the
AI coding agents are becoming critical to software development, but the configuration files that guide them, such as Agents.md or Claude.md, can be “smelly.”
That means they can contain structural flaws, redundancies, or counterproductive workflows that bloat context, waste tokens, and make coding agents less reliable.
Researchers from the Department of Computer Science at Brazil’s Federal University of Minas Gerais hope to shed light on this problem, presenting what they call the “first catalog of smells” for coding agent configuration files. The most odorous? Lint and skill leakage, context bloat, and conflicting instructions.
“Our results show that these smells are widespread in practice,” the researchers wrote. Consequently, they “may directly influence how coding agents interpret project conventions, prioritize instructions, and perform development tasks.”
Smelly configs in the harness make models misbehave
Agents like Claude Code, Codex, Cursor, and Gemini are increasingly taking