June 7, 2026 · wwwatch

June 7, 2026

Three infrastructure updates worth a quick look today. vLLM v0.22.1 patches multi-node Ray serving hangs and DeepSeek-V4 init failures, making it a worthwhile upgrade if you run distributed deployments. Ollama v0.30.6 adds QAT-optimized Gemma 4 weights that reduce memory pressure and introduces a coding agent hook via `ollama launch omp`. On the security side, Dify v1.14.2 tightens tenant isolation and locks tool credential updates to admins, with fixes worth reviewing if you run multi-tenant workflows.

infra_api

Ollama Adds Gemma 4 QAT Weights and an AI Coding Agent Hook

Ollama v0.30.6 ships QAT-optimized Gemma 4 models for lower memory usage, plus a new coding agent integration via `ollama launch omp`. Apple Silicon users also get an MLX quantization improvement.

security

Dify Patches Tenant Isolation, Workflow Tracing, and Tool Credentials

Dify v1.14.2 tightens tenant isolation, locks down tool credential updates to admins, and fixes a cluster of workflow reliability bugs including broken tracing after human-in-the-loop resume. Here is what changed and what to act on now.

infra_api

vLLM 0.22.1 Fixes Multi-Node Serving and Adds AMD Zen Acceleration

vLLM v0.22.1 ships targeted bug fixes for multi-node Ray serving hangs, DeepSeek-V4 initialization failures, and several model-loading regressions, plus new support for JetBrains' Mellum v2 and zentorch-accelerated inference on AMD Zen CPUs.