We run 14 local-first agent harnesses with all working memory in TOON instead of Markdown — measured benchmarks (including the number that doesn't favor us) and the upstream bug report we got wrong
这条记录涉及生成能力或端侧推理进展,适合跟踪模型效率、部署门槛和应用机会。
We moved our agent fleet's working memory off Markdown and onto TOON (Token-Oriented Object Notation) in December 2025 and just wrote up what 14 harnesses taught us.
The honest numbers (tiktoken o200k, 100 uniform CRM records):
- TOON 2,068 tokens vs row-object JSON 3,074 (-33%) vs pretty-printed 4,973 (-58%)
- A tight Markdown *table* is nearly equal to TOON (-4%) — the win is vs JSON, not vs a hand-optimized table
- Columnar JSON beats TOON by ~4% — we publish that too. TOON's edge over columnar is readability + declared `[N]`/field counts a validator can check, not tokens
- At a 2,500-token budget: TOON fits all 100 rows → correct answer; row-object JSON is cut at 81 rows → confidently wrong. Same data, same question.
The part r/LocalLLaMA might enjoy most: we filed a bug against the TOON CLI, the maintainer closed it not-a-bug, and he was right — our dialect had drifted off spec and our own machine memory had recorded the correct answer months earlier. Nobody pulled the thread until a human editor read the draft.
Article: https://netstatz.com/toon-structured-machine-memory/
Repro repo (scripts + every artifact, MIT): https://github.com/ianbmacdonald/article-toon-benchmarks
Everything runs local — the harnesses drive local models for bulk pipeline steps (Qwen3.6 on a lemonade fleet) with frontier models only where reasoning depth pays. Happy to answer questions; the benchmarks take ~30s to re-run with uv.