Skip to content

Releases: ggml-org/llama.cpp

b9592

10 Jun 20:54
ac4cdde

Choose a tag to compare

vendor : update LibreSSL to 4.3.2 (#24397)

Signed-off-by: Adrien Gallouët [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9591

10 Jun 20:20
e95dae1

Choose a tag to compare

Remove padding and multiple D2D copies for MTP (#24086)

  • Make ggml_gated_delta_net take only the initial recurrent state (D, 1, n_seqs) and passes the snapshot count K as an op parameter instead of inferring it from state->ne[1].

Remove the padding hack and copy all emitted snapshots into the recurrent cache with a single strided ggml_cpy

  • Make GDN changes in all backends. Address review comments.

  • Fix CI build errors

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9590

10 Jun 14:49
d2462f8

Choose a tag to compare

chat: fix LFM2/LFM2.5 ignoring json_schema (#24377)

The LFM2 specialized template handler only built a grammar for tool-calling,
silently ignoring json_schema from response_format.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9589

10 Jun 14:13
fb83cc9

Choose a tag to compare

CUDA: Fix ssm_scan_f32 data-races (#24360)

  • Add missing syncthreads before resuing cub_temp_storage

__syncthreads() is required before being allowed to resue TempStorage
smem:
https://nvidia.github.io/cccl/unstable/cub/api/classcub_1_1BlockLoad.html#_CPPv4I0EN3cub9BlockLoad4LoadEv20RandomAccessIteratorRA14ItemsPerThread_1Ti

  • Add one more missing __syncthreads

Could also double-buffer, but alternative is to simply ensure all
threads have read smem* before writing to it again in the next loop
iteration

  • Remove unused smem from ssm_scan_f32

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9587

10 Jun 08:12
d2e22ed

Choose a tag to compare

speculative : fix "ngram-map-k4v" name in logging (#24253)

This is a non-functional change.

When using --spec-type ngram-map-k4v, the log messages at startup and
runtime say ngram-map-k. Added logic in the in the constructor of
common_speculative_impl_ngram_map_k to pass the correct
COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K4V when config.key_only is
false.

After this change, the log messages use the correct name.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9585

09 Jun 18:15
d73cd07

Choose a tag to compare

graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (#24357)

  • llama-graph : apply embedding scale when deepstack is not used

  • nits: remove non-existant hunyuan-vl from the tests

  • apply suggestion from @gabe-l-hart


Co-authored-by: Xuan Son Nguyen [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9584

09 Jun 17:07
e25a32e

Choose a tag to compare

ci : fix windows release (#24369)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9581

09 Jun 13:44
d6d0ce8

Choose a tag to compare

vulkan: reduce iq1 shared memory usage for mul_mm (#24287)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9580

09 Jun 13:07
b4e3dc6

Choose a tag to compare

vulkan: add v_dot2_f32_f16 support in matrix-matrix multiplication and Flash Attention (#24123)

  • vulkan: add support for valve fp16 dot2 extension

  • use macro for dot2 path choice

  • properly check for the feature

  • add dot_product abstraction to reduce preprocessor branching

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9578

09 Jun 12:41
9682e35

Choose a tag to compare

mtmd: refactor video subproc handling (#24316)

  • mtmd: refactor video subproc handling

  • Update tools/mtmd/mtmd-helper.cpp

Co-authored-by: Mikko Juola [email protected]


Co-authored-by: Mikko Juola [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: