May 28, 2026 · wwwatch

May 28, 2026

Two releases stand out today. vLLM 0.21.0 lands Blackwell GPU support alongside KV offloading tied into the Hybrid Memory Allocator, plus two breaking changes your team should review before upgrading. On the tooling side, Unsloth Studio now connects to cloud APIs from OpenAI and Anthropic, with built-in web search, code execution, and prompt caching that reportedly cuts costs by 50 to 90%. Elsewhere, AutoGPT Platform beta adds Copilot scheduling and a skills registry, and a new RL method called AXPO addresses tool-use collapse in multimodal agents.

tool

Unsloth Studio Connects to Cloud APIs with Web Search and Code Execution

Unsloth Studio now connects to OpenAI, Anthropic, and other cloud API providers, adding built-in web search, code execution, image generation, and prompt caching that cuts costs by 50 to 90%. A larger revamp with major features and design changes is coming within weeks.

research

New RL Method Fixes Tool Use Collapse in Multimodal Agents

Standard RL training for agentic vision-language models suppresses learning at tool calls, causing tool use on only ~30% of rollouts. AXPO resamples those failing subgroups and recovers the signal.

framework

Hermes 0.14 Turns Any OAuth AI Sub Into a Local API Endpoint

Hermes Agent v0.14.0 ships a local OpenAI-compatible proxy that routes Claude Pro, ChatGPT Pro, and SuperGrok into any coding tool that speaks the OpenAI API. The release also cuts cold-start time by ~19 seconds and makes `pip install hermes-agent` work cleanly from PyPI.

framework

AutoGPT Platform Gets Copilot Scheduling, Skills Registry, and Bot Commands

AutoGPT Platform beta v0.6.62 ships native scheduling for Copilot followups, a self-distilled skills registry, and a suite of bot commands that make conversational agents easier to build and deploy. The release also adds public sharing for agent chat results and a cost breakdown panel.

infra_api

vLLM 0.21.0 Brings Blackwell Support and Smarter KV Offloading

vLLM v0.21.0 ships KV offloading integrated with the Hybrid Memory Allocator, a new attention backend for Blackwell GPUs, and speculative decoding that respects reasoning budgets. Two breaking changes require immediate attention from teams building on vLLM.