NPCs that truly talk, listen, remember, and move their face in sync with what they’re saying — in real time, in your game engine, for free. Convai just shipped v4 and it changes the NPC equation entirely.
The Story
If you’ve been following game AI, you know the name Convai. Founded in 2022 by Purnendu Mukherjee — a former NVIDIA conversational AI engineer who built Jensen Huang’s first 3D avatar chatbot at GTC 2020 — Convai has been quietly building the infrastructure for truly living NPCs.
V4 is not a minor patch. It’s a full rebuild of the communication layer and a new AI model for facial animation. Here’s what just landed:
- WebRTC protocol — the old gRPC streaming stack is gone, replaced with a persistent WebRTC connection that drops latency to under 1.5 seconds end-to-end (voice in → AI response → lipsync out)
- NeuroSync — Convai’s own transformer-based neural model that converts audio to 52 ARKit-compatible facial blendshapes at 60 fps, in real time
- HandsFree VAD — Voice Activity Detection: no more push-to-talk. The NPC just listens, like a person would
- Long-term memory (Mimir) — Characters now remember you across multiple sessions, using a 5-layer memory system that combines short-term verbatim recall, LLM-generated summaries, and hybrid semantic search retrieval
Both the Unity Asset Store plugin (v4.0.0) and the Unreal Engine FAB plugin shipped in the last two weeks of March 2026. Both are free.
NeuroSync: The Technical Bit That Matters
NeuroSync is the piece I want to zoom in on, because it solves a problem that’s been ugly for years. Getting believable lipsync on real-time AI dialogue used to mean choosing between: (a) crude phoneme matching that looks robotic, (b) expensive per-character fine-tuning, or (c) baking animations offline — which kills the whole point of dynamic AI dialogue.
NeuroSync is a transformer-based Seq2Seq model. Its encoder processes 128 frames of audio features; its decoder uses cross-attention to generate blendshape coefficients. Output: a tensor of shape (B, T, 68) — 52 ARKit blendshape channels (eye blinks, jaw open, mouth smile, brow raises…), 9 head/eye rotation parameters, and 7 emotion intensity values per frame.
The model is small by design — MIT-licensed on Hugging Face, can run locally. In production via the Convai SDK it runs cloud-side and streams results back via WebRTC. The result: a MetaHuman’s 250+ facial blendshapes firing in sync with your AI dialogue, at 60 fps, with emotion overlay.
It integrates with Unreal Engine’s LiveLink API and supports MetaHuman, Reallusion CC4/CC5, ARKit-standard custom rigs, ReadyPlayerMe, Daz3D, and Avaturn characters. For Unity it drives any ARKit-mapped blend shape setup. You don’t need to retarget or configure anything by hand.
Why You Should Care
Because the NPC barrier just dropped to zero setup cost.
The Mimir memory system is what makes this genuinely different from a chatbot duct-taped to a character. NPCs running Convai v4 track conversation across sessions using a 5-layer hierarchy: scene awareness (what’s around them via vision), short-term verbatim turns, medium-term LLM-generated summaries, long-term consolidated memories scored for importance, and a working memory layer that composes the final prompt. Retrieval uses hybrid BM25 + semantic search with recency decay. This is how memory actually works, not just “here’s the last 10 messages.”
The Narrative Design system on top of that lets you define NPC objectives using a graph of sections, triggers (spatial, time-based, event-based), and AI-driven decision points — so you get the control of scripted dialogue combined with the flexibility of generative AI. KRAFTON used this with NVIDIA ACE for PUBG co-player characters. NetEase deployed it in NARAKA: BLADEPOINT for local AI teammates. These are not indie experiments. These are shipped games.
The action system is also worth calling out: NPCs can execute Atomic Actions (Move, PickUp, Drop, Dance) and Complex Actions (multi-step sequences like “fetch me a jetpack” → navigate → grab → return → hand off). The character reasons through physical feasibility, its own mental state, and the scene context before deciding what to do.
Try It / Follow Them
- Unity plugin (free): Unity Asset Store — Convai NPC AI Engine v4.0.0
- Unreal Engine plugin (free): Search “Convai” on fab.com or Epic Games Launcher
- NeuroSync model (MIT license): Hugging Face — convaitech/NEUROSYNC
- NeuroSync local API (open-source): GitHub — AnimaVR/NeuroSync_Local_API
- Project Neural Nexus demo: conv.ai/ProjectNeuralNexus
- Convai YouTube (tutorials + demos): youtube.com/@convai
- MetaHuman integration guide: Convai Blog
IK3D Lab Take
The free plugin, the MIT-licensed NeuroSync model, the MetaHuman and Reallusion support — Convai is clearly betting on developer adoption first and monetization second. That’s the right call. Fifteen thousand developers are already in the ecosystem, and now the v4 stack is actually good enough to ship with.
What’s most interesting to me isn’t the dialogue (LLMs are everywhere) — it’s the memory architecture and the real-time facial animation pipeline as a single integrated stack. Those two pieces together are what make it feel like a character, not a chatbot-in-a-costume. The fact that you can run NeuroSync locally, swap in your own LLM via BYOLLM, and drive a MetaHuman or a CC5 rig without any custom code is genuinely impressive for a free tool.
If you’re building anything interactive — a game, a training simulation, an XR experience — this is the toolkit to evaluate right now. The NPC quality ceiling just got significantly higher, and the floor price just hit zero.