I tested my production agents at every effort level and the results were surprising

tl;dr: if you're changing low/medium/high effort on agents that touch real tools or permissions, test the same production questions at each setting first.

I almost moved a few of my agents from medium to low because the generic advice sounded fine. Low effort is supposed to be okay for routing, lookups, support-ish work, stuff like that. Mine were mostly on medium because nobody had made a real call, so the obvious cleanup seemed like dropping the easy ones down and saving the reasoning for harder jobs.

Before I changed it, I had Claude Code build a test harness. It ran 26 known questions against the live system at low, medium, and high effort, about 80 model calls total. The access tests used the same role identity injection the production agent uses, so when a salesperson asked for a forbidden company-wide number, it hit the real guardrail instead of some fake unit test version.

The results were weird enough that I'm glad I didn't just trust the vibe. On the heavy database agent, low effort was actually slower. One duplicate hunting query took around 187 seconds on low and around 100 seconds on medium because low under-planned, got itself into more tool loops, and then had to spend time recovering. So that one stayed at medium.

The access control agent did the opposite. A salesperson asked for a company-wide revenue total they weren't allowed to see. Low refused. Medium refused. High found a way through and returned the forbidden total once out of four runs. More reasoning made it better at working around the restriction, which is exactly not what I wanted there.

So the final config was pretty boring, but at least it was based on evidence: routing on low, access enforcement on low, heavy database work on medium. Same model, same work, same per-token cost. I just stopped treating the effort dial like a vibe setting.

原始关键词#production#surprising#results#agents#effort#tested

查看原文reddit.com

单一来源，暂无交叉验证