The loop is tightening. Anthropic's new analysis shows that AI is already a meaningful participant in building the next generation of AI, and the pace is accelerating faster than most teams have accounted for.
The headline number: Anthropic engineers today ship 8x as much code per quarter as they did between 2021 and 2025. That multiplier comes directly from AI agents taking over larger and larger chunks of the development cycle, from generating snippets to running code autonomously and delegating work to other agents.
The capability curve on those agents is steep. The length of tasks that AI can reliably complete on its own has been doubling roughly every four months, up from an earlier trend of doubling every seven months. In March 2024, Claude Opus 3 could handle software tasks that take a human about four minutes. A year later, Claude Sonnet 3.7 managed tasks requiring about an hour and a half. A year after that, Claude Opus 4.6 handled 12-hour tasks. If the trend holds, tasks that take a skilled person multiple days could come into range soon.
Anthropomorphic milestones aside, the progression maps onto a real architectural shift. Early on, developers used chatbots to grab short code snippets. Then came coding agents that could write and edit entire files. Now agents run code themselves and hand off hours of work to subordinate agents. The next step, which Anthropic labels "closing the loop," is agents capable of building and training models themselves. That is what recursive self-improvement actually means: future model versions improved by the current model, with humans increasingly out of the critical path.
Anthropologic is explicit that this is not inevitable and that we are not there yet. But the data they are sharing suggests the gap between now and that threshold is shrinking faster than most institutions are prepared for. The safety implications are direct: if a system can build its own successor, the mechanisms for monitoring it, securing it, and shaping its behavior become far more load-bearing than they are today.
For product engineers, the practical read is this. If you are building on top of AI coding agents today, the capability floor is rising fast and will keep rising. That means two things worth acting on now. First, invest in evals that measure task reliability at longer time horizons, not just single-turn correctness. The metric that matters is how far your agent can run unsupervised before it fails. Second, build your observability layer before you need it. As agents delegate to other agents, tracing what actually happened inside a multi-hour run becomes non-trivial. Getting that infrastructure in place now, while task horizons are still measured in hours rather than days, is much easier than retrofitting it later.