How to improve RAM offload?

I have only 12GB VRAM (RTX3060) but have enough RAM to run Qwen3.6 27B Q4 with offload. Something tells me that it won't achieve maximum performance but why DRAM speed is only around 30GB/s (HWiNFO data) during inference with dual channel 5200 RAM? TG is 3.12 tok/sec with 18K tokens result.

I expected slow speed, but can't understand where is the bottleneck, is it how LM Studio works or I need better CPU (I have 7500F). Of course dual 3090 will do the work, but it is what is for now.

Tried smaller prompt with 6 CPU threads, Q8 KV cache, 37 GPU offload, got TG 4.95 tok/sec and bandwidth was 30-35GB/s.

原始关键词#improve#offload#ram#to

查看原文reddit.com

单一来源，暂无交叉验证