Skip to content

Releases: ollama/ollama

v0.30.10

17 Jun 16:22
e1f7f9c

Choose a tag to compare

What's Changed

  • Command A and North family models now run on Apple Silicon with the MLX engine
  • Updated the underlying llama.cpp engine to build 9672
  • Fixed build artifacts for MLX

Full Changelog: v0.30.9...v0.30.10

v0.30.9

15 Jun 19:55
0f047fe

Choose a tag to compare

What's Changed

  • Support for Cohere2Moe architecture
  • Fixed LFM2 parser/render for cases where thinking was not emitted
  • Fixed issue where ollama launch claude and other coding agent or assistant use cases would only output one token
  • Ollama will now return an error if a single message is larger than the current context window

Full Changelog: v0.30.8...v0.30.9-rc1

v0.30.8

12 Jun 17:04
12e0437

Choose a tag to compare

What's Changed

  • Fixed ollama launch selecting the wrong provider in some cases
  • Improved prompt caching by decoupling it from context shift for better KV cache reuse
  • More stable MLX inference with hardened linear and embedding layers
  • MLX runner now creates snapshots during prompt processing and speculative decoding for improved reliability
  • Improved recurrent model support with per-boundary states from the gated-delta kernels

Full Changelog: v0.30.7...v0.30.8

v0.30.7

07 Jun 21:51
f0078ae

Choose a tag to compare

Ollama Launch now supports Hermes Desktop, a native desktop interface for the Hermes agent. Run it alongside your Hermes agent to get a visual interface for managing conversations, integrations, and messaging apps.

ollama launch hermes-desktop
image

What's Changed

  • Hermes Desktop is now available via ollama launch hermes-desktop with native Windows configuration path support
  • OpenAI-compatible API models list now aligns with available model tags
  • Added documentation describing the llama.cpp update process
  • Updated Zod schema examples to use the native toJSONSchema helper

Full Changelog: v0.30.6...v0.30.7

v0.30.6

05 Jun 20:00
87cff95

Choose a tag to compare

New models

  • Gemma 4 QAT weights: the Gemma 4 family is now optimized with Quantization-Aware Training (QAT) to dramatically reduce memory requirements and maximize on-device performance. Look for the tags ending in -qat:
    • gemma4:e2b-it-qat
    • gemma4:e4b-it-qat
    • gemma4:12b-it-qat
    • gemma4:26b-a4b-it-qat
    • gemma4:31b-it-qat

What's Changed

  • ollama launch omp now integrates with Oh My Pi, an AI coding agent with IDE integration
  • MLX embedding layers now use NVFP4 global scale for improved quantization on Apple Silicon

Full Changelog: v0.30.5...v0.30.6

v0.30.5

04 Jun 17:00
3370ff8

Choose a tag to compare

What's Changed

  • Fixed the gemma4:12b floating point exception crash on x86, CUDA, Linux, and Windows systems.
  • ollama launch hermes-desktop now launches Hermes Desktop and can skip rebuilding when a packaged desktop app is already installed.
  • ollama launch hermes now supports native Windows installs through the Hermes PowerShell installer.
  • Added Cline CLI integration docs.

Full Changelog: v0.30.4...v0.30.5

v0.30.4

03 Jun 18:48
229a130

Choose a tag to compare

New models

  • Nemotron-3-Ultra: NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

What's Changed

  • Fixed multimodal models not using GPU on the llama.cpp backend can now use Metal GPU offload on Apple Silicon, improving multimodal performance on supported Macs.
  • ollama create --experimental now respects REQUIRES in Modelfiles for MLX-based models.
  • ollama launch codex now cleans up old conflicting Codex profile config before launching.
  • ollama launch pi now migrates users from the legacy Pi package to the official package and preserves the correct npm install prefix.
  • Pi web search setup now updates only when a newer package is available.
  • Windows cleanup now terminates the llama.cpp backend more reliably.
  • Updated the llama.cpp backend.

Known Issues

  • gemma4:12b crashes with floating point exception

Full Changelog: v0.30.3...v0.30.4

v0.30.3

03 Jun 16:35
50bbda5

Choose a tag to compare

New models

  • Gemma 4 12B: high-performance multimodal intelligence that runs directly on laptops, combining efficiency with advanced reasoning.

What's Changed

  • Added support for gemma4:12b.

Full Changelog: v0.30.2...v0.30.3

v0.30.2

03 Jun 00:37
4b5bdd3

Choose a tag to compare

What's Changed

  • ollama launch now supports Qwen Code and can guide users through installing the Cline CLI when it is missing.
  • ollama launch codex now uses an isolated launch configuration, avoiding conflicts with a user's existing Codex settings.
  • Added llama.cpp backend compatibility support for Poolside's Laguna architecture.
  • The llama.cpp backend now includes cached prompt tokens in token accounting, improving usage reporting for requests with prompt cache hits.
  • The llama.cpp backend now ignores SSE ping comments, improving streaming compatibility with newer backend behavior.
  • The llama.cpp backend now detects load stalls from server output so failed model loads surface more reliably instead of hanging.
  • Radeon 8060S integrated GPUs are now allowed by default.
  • Template details are included in logs to make troubleshooting model prompts easier.
  • Added Hermes Desktop configuration docs.
  • Fixed a build issue in the Laguna compatibility patch, restoring Laguna support in release builds.

Full Changelog: v0.30.0...v0.30.2

v0.30.0

13 May 14:32
2c71d8d

Choose a tag to compare

Ollama 0.30 is now available, with improved compatibility and performance using llama.cpp. This augments the MLX engine on Apple Silicon, bringing support to a wider range of hardware.

This release brings support for a wider range of models, including GGUF-based models from Hugging Face and your own fine-tuned models along with faster performance on NVIDIA hardware.

Known issues:

  • laguna-xs.2 is not yet supported on Windows/Linux.
  • llama3.2-vision is not yet supported
  • nomic-embed-text now converts inputs to lowercase per the model card where prior Ollama versions incorrectly preserved mixed case