MTP-only GGUF subsets: Qwen3.5/3.6
They are just MTP-only GGUF subsets of Qwen3.5/3.6 Medium/Large (27B and above) models (to accelerate token generation of Qwen-based models without MTP tensors).
But I hope they help experimenting with various Qwen3.5/3.6-based fine-tunes.
The reason I originally created some of these MTP-only subsets was to accelerate token generation of trohrbaugh/Qwen3.5-122B-A10B-heretic (self-converted version) but the main reason I published them is Ornith-1.0-35B .
- To show exactly how Qwen3.5/3.6's MTP tensors can be embedded inside an existing GGUF file (and making them easy) I recently found that one of the Ornith-1.0-35B quants embed MTP tensors stating that it's from Qwopus3.6-35B-A3B and... their MTP tensors are just from original Qwen's.
- To make MTP-only models with dual uses (1. separate draft model file / 2. model file for grafting) available Some MTP-only subsets (in GGUF format) are small but only for grafting (i.e. transplanting MTP-related tensors) and cannot be used as a separate draft model file (which llama.cpp supports; --model-draft
on llama-server). I hope that publishing easy-to-test model files makes experimenting with Qwen3.5/3.6-based fine-tunes easier.
Edit (2026-07-01): MTP-only GGUF subset of Qwen3.5-9B is added (since there's many fine-tunes based on this model; there's no plan for 4B or smaller).