Skip to content

Releases: ollama/ollama

v0.30.8

12 Jun 17:04
12e0437

Choose a tag to compare

What's Changed

  • Fixed ollama launch selecting the wrong provider in some cases
  • Improved prompt caching by decoupling it from context shift for better KV cache reuse
  • More stable MLX inference with hardened linear and embedding layers
  • MLX runner now creates snapshots during prompt processing and speculative decoding for improved reliability
  • Improved recurrent model support with per-boundary states from the gated-delta kernels

Full Changelog: v0.30.7...v0.30.8

v0.30.7

07 Jun 21:51
f0078ae

Choose a tag to compare

Ollama Launch now supports Hermes Desktop, a native desktop interface for the Hermes agent. Run it alongside your Hermes agent to get a visual interface for managing conversations, integrations, and messaging apps.

ollama launch hermes-desktop
image

What's Changed

  • Hermes Desktop is now available via ollama launch hermes-desktop with native Windows configuration path support
  • OpenAI-compatible API models list now aligns with available model tags
  • Added documentation describing the llama.cpp update process
  • Updated Zod schema examples to use the native toJSONSchema helper

Full Changelog: v0.30.6...v0.30.7

v0.30.6

05 Jun 20:00
87cff95

Choose a tag to compare

New models

  • Gemma 4 QAT weights: the Gemma 4 family is now optimized with Quantization-Aware Training (QAT) to dramatically reduce memory requirements and maximize on-device performance. Look for the tags ending in -qat:
    • gemma4:e2b-it-qat
    • gemma4:e4b-it-qat
    • gemma4:12b-it-qat
    • gemma4:26b-a4b-it-qat
    • gemma4:31b-it-qat

What's Changed

  • ollama launch omp now integrates with Oh My Pi, an AI coding agent with IDE integration
  • MLX embedding layers now use NVFP4 global scale for improved quantization on Apple Silicon

Full Changelog: v0.30.5...v0.30.6

v0.30.5

04 Jun 17:00
3370ff8

Choose a tag to compare

What's Changed

  • Fixed the gemma4:12b floating point exception crash on x86, CUDA, Linux, and Windows systems.
  • ollama launch hermes-desktop now launches Hermes Desktop and can skip rebuilding when a packaged desktop app is already installed.
  • ollama launch hermes now supports native Windows installs through the Hermes PowerShell installer.
  • Added Cline CLI integration docs.

Full Changelog: v0.30.4...v0.30.5

v0.30.4

03 Jun 18:48
229a130

Choose a tag to compare

New models

  • Nemotron-3-Ultra: NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

What's Changed

  • Fixed multimodal models not using GPU on the llama.cpp backend can now use Metal GPU offload on Apple Silicon, improving multimodal performance on supported Macs.
  • ollama create --experimental now respects REQUIRES in Modelfiles for MLX-based models.
  • ollama launch codex now cleans up old conflicting Codex profile config before launching.
  • ollama launch pi now migrates users from the legacy Pi package to the official package and preserves the correct npm install prefix.
  • Pi web search setup now updates only when a newer package is available.
  • Windows cleanup now terminates the llama.cpp backend more reliably.
  • Updated the llama.cpp backend.

Known Issues

  • gemma4:12b crashes with floating point exception

Full Changelog: v0.30.3...v0.30.4

v0.30.3

03 Jun 16:35
50bbda5

Choose a tag to compare

New models

  • Gemma 4 12B: high-performance multimodal intelligence that runs directly on laptops, combining efficiency with advanced reasoning.

What's Changed

  • Added support for gemma4:12b.

Full Changelog: v0.30.2...v0.30.3

v0.30.2

03 Jun 00:37
4b5bdd3

Choose a tag to compare

What's Changed

  • ollama launch now supports Qwen Code and can guide users through installing the Cline CLI when it is missing.
  • ollama launch codex now uses an isolated launch configuration, avoiding conflicts with a user's existing Codex settings.
  • Added llama.cpp backend compatibility support for Poolside's Laguna architecture.
  • The llama.cpp backend now includes cached prompt tokens in token accounting, improving usage reporting for requests with prompt cache hits.
  • The llama.cpp backend now ignores SSE ping comments, improving streaming compatibility with newer backend behavior.
  • The llama.cpp backend now detects load stalls from server output so failed model loads surface more reliably instead of hanging.
  • Radeon 8060S integrated GPUs are now allowed by default.
  • Template details are included in logs to make troubleshooting model prompts easier.
  • Added Hermes Desktop configuration docs.
  • Fixed a build issue in the Laguna compatibility patch, restoring Laguna support in release builds.

Full Changelog: v0.30.0...v0.30.2

v0.30.0

13 May 14:32
2c71d8d

Choose a tag to compare

Ollama 0.30 is now available, with improved compatibility and performance using llama.cpp. This augments the MLX engine on Apple Silicon, bringing support to a wider range of hardware.

This release brings support for a wider range of models, including GGUF-based models from Hugging Face and your own fine-tuned models along with faster performance on NVIDIA hardware.

Known issues:

  • laguna-xs.2 is not yet supported on Windows/Linux.
  • llama3.2-vision is not yet supported
  • nomic-embed-text now converts inputs to lowercase per the model card where prior Ollama versions incorrectly preserved mixed case

v0.24.0

14 May 02:24
c28ddc0

Choose a tag to compare

Codex App

Ollama 0.24 includes support for the Codex App, OpenAI's desktop experience for working on Codex threads in parallel with built-in worktree support and git functionality.

ollama launch codex-app
CleanShot 2026-05-14 at 15 04 18@2x

Built-in browser

Codex can load local servers and sites in its built-in browser, enabling you to directly annotate on the page to request changes.

codex-annotate copy

Review mode

Review code inside the app, leave comments, and iterate without leaving your workspace.

codex-comments copy 2

Choosing a model

For difficult coding and agentic tasks:

  • kimi-k2.6 (with vision support)
  • glm-5.1

For local use without an Ollama Cloud subscription:

  • nemotron-3-super
  • gemma4:31b
  • qwen3.6

Restore anytime

To restore the previous configuration of Codex App, run:

ollama launch codex-app --restore

What's Changed

  • Reworked the MLX sampler for improved generation quality on Apple Silicon

Full Changelog: v0.23.0...v0.24.0

v0.23.4

13 May 20:40
3af1a00

Choose a tag to compare

What's Changed

  • ollama launch opencode now supports vision models with image inputs
  • Fixed formatting of Claude tool results when using local image paths

Full Changelog: v0.23.3...v0.23.4