I cut my AI dictionary app’s first streamed result from 13.3s to 3.0s by making it stop overthinking the word “apple”

I’m building UrLingo, a personal dictionary/wordbook app for that very specific human ritual where you search “[word] meaning,” understand it for 14 seconds, and then your brain quietly throws it into the ocean.

The core flow is simple:

User searches a word → backend checks auth/quota/preferences → OpenAI generates a structured dictionary entry → frontend streams (will come to the streaming part in a bit) the response.

Simple. Beautiful. Innocent.

Except my app was taking 13 seconds before showing the first useful streamed output.

Initial numbers were rough:

OpenAI TTFT: 8296ms

First frontend OpenAI chunk: 13274ms

Hidden reasoning tokens: 1088

Yes. 1088 hidden reasoning tokens.

For a dictionary response.

Apparently the model needed to assemble the Seven Kingdoms before explaining what a word means.

After profiling and fixing the path, the latest batch looks like this:

OpenAI TTFT p50/p95: 1247ms / 3514ms

First frontend OpenAI chunk p50/p95: 3038ms / 4873ms

Hidden reasoning tokens: 0

Priority tier: true on all runs

So roughly:

OpenAI TTFT p50: 6.7x faster

First frontend chunk p50: 4.4x faster

First frontend chunk p95: 2.7x faster

Reasoning overhead: eliminated

What actually helped:

- Removed reasoning overhead for simple dictionary lookups. No need for Socrates to define “serendipity.”

主题标签OpenAI

原始关键词#overthinking#dictionary#streamed#making#result#apple

查看原文reddit.com

单一来源，暂无交叉验证