End of an Agony. Real production service that uses LLM to earn money my team had made and now we are so happy that it will die. Here are some of my final "experiences".

Hello everyone.

I had posted in this sub about making a production service about 8 months ago. Here the link of my previous post . The idea was the same. We wanted to make a real production service that we can provide to clients to earn money. AI assistant that works through messenger, and helps users to work with appointments to the doctors of private clinics.

This all devolved into a more than half a year of frustrations and mental agony, and we are finally retiring and shutting down this project. I'm free. I AM FREE!!! I AM SO FUCKING FREE!!!

Now I want to share what I have "experienced" while implementing it.

First of all. Overall quality of Open Source models after 8 months got really good, and finally looks competitive. You can really build something that could be really usable, but with caveats. Currently in my own personal experience and opinion, all LLMs are really good for personal first party one on one usage for now. You "consume" what LLMs generate. You know that it won't work correct for 100%, and if it shits itself, you can fix it by yourself or make LLM to make corrections.

However, when you provide your LLM based service to second party, in which they provide their own services to third party, things will get very bad. You do not guarantee 100% correct result, but your client promises to their own clients that their service (that depends on your) will provide correct result always. When it fails, and it will certainly fail, it will frustrate everyone and spoil everything.

Now lets begin.

If you look that my previous post, I have been using direct API calls through OpenRouter, and handling all of that by myself. Readers of previous post suggested to use PydanticAI. I've tried it and it was amazing, and documentation was great, it offloaded all of those bulky direct API interactions, especially with tools. It worked great while testing it, but when it launched on production it started to show it's own problems.

While PydanticAI can work on sync environments, it had been mostly designed for async in mind. Even it's sync variations are actually some kind of weird tricks with async under the hood. If your whole architecture is sync, you are either forced to rewrite everything to async, which may be impossible or hard, or use weird tricks to launch async loops inside your sync environment. It could literally halt your whole process and become unresponsive, forcing to use system based kill commands.

Now lets talk about OpenRouter and all providers that work under it. I have been using:

- GLM (4.5, 5.0, 5.2)

- Deepseek

- Mimo

- Qwen

- ChatGPT

- Claude

- Minimax

I have been switching for a multiple models and had discovered that providers does not guarantee proper service uptime. Even the official model makers can shit themselves and return empty response message instead of proper errors. Even if you use fallback providers they all can shit themselves at the same time, breaking all flow.

Another problem is that Simple users' questions can make model return broken structured data, validation may sometime fix that but it mostly will shit itself. It looks something like this:

User: Hello, is the next day available? Bot: Bot: Bot: Validator: Bot: Bot: Validator: * THIS GOES MULTIPLE TIMES * ... Bot: Throws an exception that it had shat itself and was unable to form proper answer

Now here is the problem of LLMs. PydanticAI agent can expect structured Pydantic model output. However, LLMs does not guarantee that they can return a structured output. Github is filled with complaints about that. So they suggest to make agent to return raw string or structured Pydantic model output, which makes LLM even more loose but at least it will return something right? NO! Now you are forced to make a complicated validators. It does not care now about Pydantic models' field descriptions.

The problem is that even if you make hundreds of validations and proper responses of how and why structure is not correct there will be non zero chance that it will fail so many times that it will fuck up the whole process. Even forcing it to rerun won't help you! If LLM decided that it will shit itself, it will stay shat! There are ways to add some additional words to nudge the generation differently, but it also a gamble. There is another way to increase the temperature, so the reruns could be different, but it opens the gate for another problems that I will describe below.

The next problem, Simple users' emoji in text can break the whole bot's "character" and turn itself into a weirdo. It looked like this:

User: Thank you for the help 🤩🎉 Bot: Ohhh. I'm so glad for you 😁😁. I'm so glad that everything went good for your son! 🎉🎉 User: What? I have received service and I don't have a son. Bot: I'm so sorry 😅 for bringing up the son. But I'm still so glad for you.

原始关键词#experiences#production#service#agony#final#happy

查看原文reddit.com

单一来源，暂无交叉验证