GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance

查看原文

推荐理由

这条记录涉及编程工具或代码能力更新，适合开发者评估工作流变化和可复用价值。

Summary

I found an aggregate pattern in Codex token_count

metadata: gpt-5.5

responses disproportionately land at exactly reasoning_output_tokens = 516

, with additional fixed-boundary spikes around 1034

and 1552

This appears model-specific and coincides with lower overall reasoning-token intensity, which may help explain degraded performance on complex/high-stakes Codex tasks.

This is related to #29353 , which reported a task-level reproduction where gpt-5.5

runs ending at exactly 516 reasoning tokens returned the wrong answer. This issue adds aggregate evidence across a larger Feb-Jun window.

I am not claiming this proves hidden chain-of-thought truncation. The narrower claim is that Codex telemetry shows a GPT-5.5-specific fixed-token clustering anomaly that looks consistent with thresholded reasoning-budget behavior.

Environment

- Product: Codex

- Model most implicated: gpt-5.5

- Data source: Codex token_count

metadata

- Time window analyzed: Feb 1-Jun 27, 2026 UTC

- Related issue: gpt-5.5 xhigh sometimes short-circuits with reasoning_output_tokens=516 and wrong final_answer in Codex Desktop #29353

Evidence

Metric Value

Response-level token records analyzed 390,195

Sessions represented 865

Exact reasoning_output_tokens = 516

events 3,363

主题标签官方公告OpenAIGitHub开源代码

原始关键词#performance#clustering#reasoning#degraded#leading#codex

查看原文github.com

单一来源，暂无交叉验证