I built a two-model proxy that cuts up to 70% of Claude Code's tokens, benchmarked against Headroom

推荐理由

这条记录涉及编程工具或代码能力更新，适合开发者评估工作流变化和可复用价值。

I was hitting my $100 sub limits every day, even with the 2x usage that's ending soon. So I started thinking about how to cut the context properly, without losing anything that matters.

That became a proxy I built with Claude Code over the last month. Two models do the work: one clears out the dead tokens, the other compresses what's left. Same model upstream, same answers, up to 70% fewer input tokens.

A few other tools do a version of this, so I benchmarked it against Headroom, on real coding sessions:

Tool Tokens cut $ saved Mine 54% 37% Headroom 15% 14% How each one gets there:

- Headroom compresses tool output and logs with heuristics plus one small model, and keeps the original retrievable.

- Mine runs the whole context through two models, not just one slice. That's the difference in the table.

I think there's a lot more to squeeze here. I'm aiming for higher compression still, but I need real sessions to find where it breaks and what to optimize next, so I'd love your numbers, especially the cases where it barely helps or makes things worse. And if anyone here has worked on context compression for coding agents, I'd really like to compare notes.

主题标签Claude模型发布

原始关键词#benchmarked#headroom#tokens#built#model

查看原文reddit.com

单一来源，暂无交叉验证