Run Claude Code against a local Gemma 4 or Qwen 3.6 - no API key, no cost, works on any Apple Silicon Mac
这条记录涉及编程工具或代码能力更新,适合开发者评估工作流变化和可复用价值。
If you have an Apple Silicon Mac you can run Claude Code completely locally (and free) by pointing it at a local server. Here's how:
Setup (2 minutes) brew tap ddalcu/mlx-serve https://github.com/ddalcu/mlx-serve brew install --cask mlx-core # GUI menu bar app brew install mlx-serve # CLI server only mlx-serve run gemma-4-e4b-it # downloads + starts the server (not needed if you use GUI)
Then launch Claude Code with:
ANTHROPIC_BASE_URL=http://localhost:11434 \ ANTHROPIC_API_KEY=local \ ANTHROPIC_DEFAULT_MODEL=mlx-serve \ claude
That's it. Claude Code streams, tool calls, thinking blocks, multi-turn - all work against the local model via the Anthropic Messages API.
What runs well locally
- Gemma 4 E4B 4-bit (recommended starting point, ~105 tok/s decode on M4 Max)
- Qwen 3.6 27B 4-bit with native MTP spec-decode (~36 tok/s, 1.43x faster on code tasks)
- Qwen 3.5 4B/9B for faster iteration cycles
Full walkthrough + tips for which models work best for coding tasks: https://mlxserve.com/claude-code-local/
The server is mlx-serve - MIT, no Python required, single binary. brew install mlx-serve
GitHub: https://github.com/ddalcu/mlx-serve