Run Claude Code against a local Gemma 4 or Qwen 3.6 - no API key, no cost, works on any Apple Silicon Mac

推荐理由

这条记录涉及编程工具或代码能力更新，适合开发者评估工作流变化和可复用价值。

If you have an Apple Silicon Mac you can run Claude Code completely locally (and free) by pointing it at a local server. Here's how:

Setup (2 minutes) brew tap ddalcu/mlx-serve https://github.com/ddalcu/mlx-serve brew install --cask mlx-core # GUI menu bar app brew install mlx-serve # CLI server only mlx-serve run gemma-4-e4b-it # downloads + starts the server (not needed if you use GUI)

Then launch Claude Code with:

ANTHROPIC_BASE_URL=http://localhost:11434 \ ANTHROPIC_API_KEY=local \ ANTHROPIC_DEFAULT_MODEL=mlx-serve \ claude

That's it. Claude Code streams, tool calls, thinking blocks, multi-turn - all work against the local model via the Anthropic Messages API.

What runs well locally

- Gemma 4 E4B 4-bit (recommended starting point, ~105 tok/s decode on M4 Max)

- Qwen 3.6 27B 4-bit with native MTP spec-decode (~36 tok/s, 1.43x faster on code tasks)

- Qwen 3.5 4B/9B for faster iteration cycles

Full walkthrough + tips for which models work best for coding tasks: https://mlxserve.com/claude-code-local/

The server is mlx-serve - MIT, no Python required, single binary. brew install mlx-serve

GitHub: https://github.com/ddalcu/mlx-serve

主题标签ClaudeQwen开源代码端侧推理

原始关键词#silicon#apple#gemma#local#works

查看原文reddit.com

单一来源，暂无交叉验证