Releases: ollama/ollama
v0.30.8
What's Changed
- Fixed
ollama launchselecting the wrong provider in some cases - Improved prompt caching by decoupling it from context shift for better KV cache reuse
- More stable MLX inference with hardened linear and embedding layers
- MLX runner now creates snapshots during prompt processing and speculative decoding for improved reliability
- Improved recurrent model support with per-boundary states from the gated-delta kernels
Full Changelog: v0.30.7...v0.30.8
v0.30.7
Ollama Launch now supports Hermes Desktop, a native desktop interface for the Hermes agent. Run it alongside your Hermes agent to get a visual interface for managing conversations, integrations, and messaging apps.
ollama launch hermes-desktop
What's Changed
- Hermes Desktop is now available via
ollama launch hermes-desktopwith native Windows configuration path support - OpenAI-compatible API models list now aligns with available model tags
- Added documentation describing the llama.cpp update process
- Updated Zod schema examples to use the native toJSONSchema helper
Full Changelog: v0.30.6...v0.30.7
v0.30.6
New models
- Gemma 4 QAT weights: the Gemma 4 family is now optimized with Quantization-Aware Training (QAT) to dramatically reduce memory requirements and maximize on-device performance. Look for the tags ending in
-qat:gemma4:e2b-it-qatgemma4:e4b-it-qatgemma4:12b-it-qatgemma4:26b-a4b-it-qatgemma4:31b-it-qat
What's Changed
ollama launch ompnow integrates with Oh My Pi, an AI coding agent with IDE integration- MLX embedding layers now use NVFP4 global scale for improved quantization on Apple Silicon
Full Changelog: v0.30.5...v0.30.6
v0.30.5
What's Changed
- Fixed the
gemma4:12bfloating point exception crash on x86, CUDA, Linux, and Windows systems. ollama launch hermes-desktopnow launches Hermes Desktop and can skip rebuilding when a packaged desktop app is already installed.ollama launch hermesnow supports native Windows installs through the Hermes PowerShell installer.- Added Cline CLI integration docs.
Full Changelog: v0.30.4...v0.30.5
v0.30.4
New models
- Nemotron-3-Ultra: NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.
What's Changed
- Fixed multimodal models not using GPU on the llama.cpp backend can now use Metal GPU offload on Apple Silicon, improving multimodal performance on supported Macs.
ollama create --experimentalnow respectsREQUIRESin Modelfiles for MLX-based models.ollama launch codexnow cleans up old conflicting Codex profile config before launching.ollama launch pinow migrates users from the legacy Pi package to the official package and preserves the correct npm install prefix.- Pi web search setup now updates only when a newer package is available.
- Windows cleanup now terminates the llama.cpp backend more reliably.
- Updated the llama.cpp backend.
Known Issues
gemma4:12bcrashes with floating point exception
Full Changelog: v0.30.3...v0.30.4
v0.30.3
New models
- Gemma 4 12B: high-performance multimodal intelligence that runs directly on laptops, combining efficiency with advanced reasoning.
What's Changed
- Added support for
gemma4:12b.
Full Changelog: v0.30.2...v0.30.3
v0.30.2
What's Changed
ollama launchnow supports Qwen Code and can guide users through installing the Cline CLI when it is missing.ollama launch codexnow uses an isolated launch configuration, avoiding conflicts with a user's existing Codex settings.- Added llama.cpp backend compatibility support for Poolside's Laguna architecture.
- The llama.cpp backend now includes cached prompt tokens in token accounting, improving usage reporting for requests with prompt cache hits.
- The llama.cpp backend now ignores SSE ping comments, improving streaming compatibility with newer backend behavior.
- The llama.cpp backend now detects load stalls from server output so failed model loads surface more reliably instead of hanging.
- Radeon 8060S integrated GPUs are now allowed by default.
- Template details are included in logs to make troubleshooting model prompts easier.
- Added Hermes Desktop configuration docs.
- Fixed a build issue in the Laguna compatibility patch, restoring Laguna support in release builds.
Full Changelog: v0.30.0...v0.30.2
v0.30.0
Ollama 0.30 is now available, with improved compatibility and performance using llama.cpp. This augments the MLX engine on Apple Silicon, bringing support to a wider range of hardware.
This release brings support for a wider range of models, including GGUF-based models from Hugging Face and your own fine-tuned models along with faster performance on NVIDIA hardware.
Known issues:
laguna-xs.2is not yet supported on Windows/Linux.llama3.2-visionis not yet supportednomic-embed-textnow converts inputs to lowercase per the model card where prior Ollama versions incorrectly preserved mixed case
v0.24.0
Codex App
Ollama 0.24 includes support for the Codex App, OpenAI's desktop experience for working on Codex threads in parallel with built-in worktree support and git functionality.
ollama launch codex-app
Built-in browser
Codex can load local servers and sites in its built-in browser, enabling you to directly annotate on the page to request changes.
Review mode
Review code inside the app, leave comments, and iterate without leaving your workspace.
Choosing a model
For difficult coding and agentic tasks:
- kimi-k2.6 (with vision support)
- glm-5.1
For local use without an Ollama Cloud subscription:
- nemotron-3-super
- gemma4:31b
- qwen3.6
Restore anytime
To restore the previous configuration of Codex App, run:
ollama launch codex-app --restoreWhat's Changed
- Reworked the MLX sampler for improved generation quality on Apple Silicon
Full Changelog: v0.23.0...v0.24.0
v0.23.4
What's Changed
ollama launch opencodenow supports vision models with image inputs- Fixed formatting of Claude tool results when using local image paths
Full Changelog: v0.23.3...v0.23.4