Agents-A1-Q8_0-GGUF works pretty well for me (anecdotal feedback)

For the last day or so I've been using Agents A1 Q8 InternScience/Agents-A1-Q8_0-GGUF

on my M1 Max mac (64GB) just like this:

llama-server -hf InternScience/Agents-A1-Q8_0-GGUF --host 0.0.0.0 --port 8080 --temp 0.85 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 1.1 --repeat-penalty 1.0

(these are the parameters they recommend)

With full 262K context available I am getting about 500 t/s pp and about 40 t/s tg. I've been using opencode with it and it seems to be roughly Qwen level - but it's early days.

I assume there are other parameters I can tweak, I just haven't looked yet.

Anyone else playing with it?

原始关键词#anecdotal#feedback#agents#pretty#works#gguf

查看原文reddit.com

单一来源，暂无交叉验证