I cut my AI dictionary app’s first streamed result from 13.3s to 3.0s by making it stop overthinking the word “apple”
I’m building UrLingo, a personal dictionary/wordbook app for that very specific human ritual where you search “[word] meaning,” understand it for 14 seconds, and then your brain quietly throws it into the ocean.
The core flow is simple:
User searches a word → backend checks auth/quota/preferences → OpenAI generates a structured dictionary entry → frontend streams (will come to the streaming part in a bit) the response.
Simple. Beautiful. Innocent.
Except my app was taking 13 seconds before showing the first useful streamed output.
Initial numbers were rough:
OpenAI TTFT: 8296ms
First frontend OpenAI chunk: 13274ms
Hidden reasoning tokens: 1088
Yes. 1088 hidden reasoning tokens.
For a dictionary response.
Apparently the model needed to assemble the Seven Kingdoms before explaining what a word means.
After profiling and fixing the path, the latest batch looks like this:
OpenAI TTFT p50/p95: 1247ms / 3514ms
First frontend OpenAI chunk p50/p95: 3038ms / 4873ms
Hidden reasoning tokens: 0
Priority tier: true on all runs
So roughly:
OpenAI TTFT p50: 6.7x faster
First frontend chunk p50: 4.4x faster
First frontend chunk p95: 2.7x faster
Reasoning overhead: eliminated
What actually helped:
- Removed reasoning overhead for simple dictionary lookups. No need for Socrates to define “serendipity.”