Case Study

Category:
Speech AI, Real-Time Systems, & Architecture Engineering
Impact:
3 Months | <200ms Latency
OpenAvatarChat is a real-time avatar interaction framework designed for synchronous audio processing and user engagement. The goal was to integrate Amazon's Nova Sonic voice AI to enable natural, low-latency conversational speech between avatars and users. The key challenge emerged from a fundamental architectural mismatch: Nova Sonic operates on asynchronous bidirectional streaming with persistent connections, while OpenAvatarChat processes audio synchronously in discrete cycles. A direct integration between these two paradigms risked deadlocks, latency spikes, and unstable connections, making a new system design essential.
Devised an isolation-based architecture to separate Nova Sonic's async streaming from OpenAvatarChat's synchronous cycles. Nova Sonic runs as a dedicated subprocess, maintaining its persistent async streaming session independently.
Implemented lock-free queues for bidirectional communication between the two processes. Outgoing audio data from OpenAvatarChat is fed asynchronously to Nova Sonic. Nova Sonic streams back real-time synthesized voice responses through the IPC channel.
Managed concurrency using non-blocking event loops, ensuring audio flows without buffering delays. Maintained synchronization between discrete audio frames and continuous Nova Sonic streams for smooth, real-time response generation.
Stress-tested the integration under continuous, multi-turn conversations to validate latency stability, throughput efficiency, and fault recovery.