Ollama Adds Gemma 4 QAT Weights and an AI Coding Agent Hook

Ollama v0.30.6 ships three changes worth paying attention to: QAT-optimized Gemma 4 weights, a new coding agent launcher, and a quantization fix for Apple Silicon.

Gemma 4 QAT weights are the headline. Google's Gemma 4 family now has Quantization-Aware Training (QAT) variants available on Ollama. The difference matters at the hardware level: QAT bakes quantization into the training process itself, which reduces memory requirements more cleanly than post-training quantization while preserving more model quality. Five sizes are available, tagged with -qat:

gemma4:e2b-it-qat
gemma4:e4b-it-qat
gemma4:12b-it-qat
gemma4:26b-a4b-it-qat
gemma4:31b-it-qat

If you have been running Gemma 4 and hitting memory pressure, swapping to a QAT tag is the immediate action here. The library page at ollama.com/library/gemma4 has the full tag listing.

ollama launch omp is a new entry point for AI coding workflows. Running that command now connects to Oh My Pi, an AI coding agent with IDE integration. The pattern follows how ollama launch has been used to wire local models into tooling, but this is the first time it points at a dedicated coding agent surface. If your team is evaluating local-first coding agents, this is worth a quick test run today.

Apple Silicon gets a small but meaningful fix. MLX embedding layers now use NVFP4 global scale for quantization. This improves quantization accuracy for embedding workloads on Apple Silicon hardware. If you are running embedding-heavy pipelines on an M-series Mac, the update should produce better numerical behavior without any code changes on your side.

The release is incremental, but the QAT models are the part that changes practical decisions. Running a 31-billion-parameter model locally has been gated on having enough VRAM or unified memory. QAT weights lower that bar. Pull the -qat tag for whichever Gemma 4 size fits your hardware, benchmark it against your current setup, and see whether you can move to a larger model tier without adding memory.