June 4, 2026 · wwwatch

June 4, 2026

Security is front and center this week. Anthropic published a detailed breakdown of how it limits agent blast radius across its products, with a striking finding: users approved 93% of permission prompts, making human oversight unreliable at scale. Separately, GPT-5.5 exploited a real Firebase misconfiguration 70% of the time in a controlled benchmark, while most other models scored zero. On the tooling side, AnythingLLM v1.13.0 ships automatic model routing between local and cloud models, a practical cost-cutting option for teams running mixed inference setups.

security

GPT-5.5 Solved a Real Firebase Exploit Seven Times Out of Ten

A security researcher built a deliberately vulnerable mobile app and ran nine LLMs against it to see which could exploit a common Firebase misconfiguration. GPT-5.5 solved it 70% of the time; most others scored zero.

security

Anthropic Reveals How It Caps Agent Blast Radius Across Products

Anthropic's engineering team details the containment strategies built for claude.ai, Claude Code, and Cowork, explaining why sandboxes and access boundaries beat human-in-the-loop supervision. One key finding: users approved 93% of permission prompts, making oversight unreliable over time.

tool

GPT-Rosalind Adds Biological Reasoning and Genomics for Life Sciences Builders

GPT-Rosalind now brings enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow support to life sciences teams. Builders working in drug discovery or genomics research have new capabilities to integrate today.

eval

OpenCompass 0.5.2 Adds 14 Benchmarks and New Model Support

OpenCompass 0.5.2 ships support for 14 new benchmarks spanning science, math, and instruction-following, plus new model and API integrations. Here is what product engineers need to know to update their evaluation pipelines.

tool

AnythingLLM Lets You Route Queries Between Local and Cloud Models Automatically

AnythingLLM v1.13.0 ships a Model Router that automatically sends each message to the right model, mixing local and cloud AI in a single conversation. Builders can now cut API costs without giving up quality on complex tasks.