I built a local LLM NPC backend focused on NPC-to-NPC conversations

推荐理由

这条记录涉及生成能力或端侧推理进展，适合跟踪模型效率、部署门槛和应用机会。

It is a fully local speech-to-speech backend for LLM NPCs. So speech-to-text, local LLM, text-to-speech, no cloud needed. The main focus was NPCs talking to each other, not just answering the player, and my study looked at how players experience witnessing those NPC-to-NPC conversations and what it does for immersion.

The NPCs can talk to each other, remember what they said, and later use that context when the player talks to them. There is also a background Game Manager AI that can inject hidden behavioral notes into NPCs to steer the story a bit.

Latency was one of the main technical challenges. With Llama 3.2 3B for VR and 7B on a 4070 Ti I was getting around 400 to 600ms Time to First Audio (TTFA), which is roughly where it starts feeling like a real conversation instead of waiting for the NPC to think. It also runs alongside the Unity scene, which you can see in the demo.

For multiple NPCs, I used a shared generation lock so the GPU does not get overloaded. Each NPC has its own LLM context/personality and TTS setup, but only one generates at a time. They take turns, and the switch between characters is basically instant, so it feels natural. The limitation is that two NPCs cannot literally speak over each other at the exact same moment.

It is WebSocket based, so it should work with Unity, Unreal, or anything else that can talk over WebSockets. I also included the Unity scripts.

I would really like people to try it, build on it, or give feedback. To adapt it to your own game, the main work is tuning the 3-layer NPC prompt setup and the Game Manager prompt. That takes a bit of work, but it is very doable with AI help, and I think a lot of it could be automated later.

Demo video, detective game in Unity: https://www.youtube.com/watch?v=Z-WZ-Prl8bI

主题标签端侧推理

原始关键词#conversations#backend#focused#built#local#llm

查看原文reddit.com

单一来源，暂无交叉验证