Added:
- New release b9592 (Latest): "vendor: update LibreSSL to 4.3.2 (#24397)" — updates the bundled LibreSSL library.
- New release b9591: "Remove padding and multiple D2D copies for MTP (#24086)" — optimises multi-token prediction by removing unnecessary padding and reducing GPU memory copies.
- New release b9590: "chat: fix LFM2/LFM2.5 ignoring json_schema (#24377)" — fixes a bug where the chat template ignored a provided JSON schema when using certain models.
- New release b9589: "CUDA: Fix ssm_scan_f32 data-races (#24360)" — adds missing synchronisation barriers to fix data races in the CUDA ssm_scan kernel.

Changed:
- The repository’s open issue count increased from 705 to 720.
- Release b9587: lost the "Latest" badge; its description changed to "speculative : fix 'ngram-map-k4v' name in logging (#24253)"; its asset entries (download files and source code archives) were removed (see Removed); reaction count increased from 2 to 5 (added carlosjln, LarsKort, skieop).
- Release b9585: reaction count increased from 5 to 6 (added carlosjln).
- Release b9584: reaction count increased from 2 to 4 (added jreng02, YektaDev).
- Release b9581: reaction count increased from 2 to 3 (added ross-rosario).

Removed:
- Entire release b9577: "server: log prompts to directory (#22031)" — added a feature to log prompts to a directory.
- Entire release b9575: "ggml : add GGML_OP_COL2IM_1D (#24206)" — added support for 1D transposed convolution in the ggml library.
- Entire release b9574: "server : do not clear slots without unified KV cache (#24190)" — fixed slot management without a unified KV cache.
- Entire release b9573: "models : fix plamo2 attention_key/value_length regression (#24317)" — fixed a regression in Plamo2 model attention.
- Old asset download entries for release b9587 (including cudart-llama-bin-win-cuda-12.4-x64.zip, cudart-llama-bin-win-cuda-13.3-x64.zip, llama-b9587-bin-*.tar.gz for various platforms, and Source code zip/tar.gz with their timestamps) were removed from the page.