Real-time conversational AI for voice systems
Design notes for on-device voice systems with explicit turn-taking, listener behavior, and affect-aware conversational policy.
- PUB 2026-06-02
- CAT notes
- STATE PUBLISHED
- TAGS 04
WORK / RESEARCH FOCUS
I work on the timing and dynamics of spoken interaction: when a system should speak, listen, backchannel, or hold back. The same practice spans ML evaluation, Swift/CoreML deployment, AI-agent orchestration, and technical direction for work that has to move from research to operated systems.
WORKING THREAD
John Brown works where timing becomes software behavior: audio timing, real-time voice AI, on-device systems, and the engineering habits needed to ship and maintain them.
The site collects technical notes, sound sketches, and visual studies without pretending they are the same kind of artifact. They share a concern with timing, control, and readable systems.
Audio, guitar, synthesis, and production keep the work grounded in latency, groove, texture, and restraint. This is where the ear for timing becomes an engineering constraint.
Those timing instincts become conversational policy: turn-taking, backchannels, VAD projection, affect signals, and evaluation tied to what the system should actually do.
Research has to survive on-device budgets, Swift/CoreML deployment, streaming state, fixed hop sizes, and pre-allocated audio loops.
ADRs, release gates, notes, images, sound sketches, and repos make the work inspectable so another person can pick it up cold.
WORK / RESEARCH
The technical center is real-time, on-device interaction. Music stays close because timing, tools, and constraints show up there too; image work can stay quiet until there is a fuller public set.
Turn-taking, backchannel behavior, VAD projection, and affect-aware policy for systems that know when to speak, listen, or hold back.
Hard real-time streaming paths with causal models, bounded state, fixed hop budgets, and clear fallbacks under frame pressure.
Protocol servers, model-provider abstractions, agent gateways, lifecycle tooling, and wave-structured multi-agent delivery.
ADRs, staged releases, roadmaps, acceptance gates, and interface contracts that keep research-to-deployment work operable.
FEATURED ENTRY POINTS
Start with technical notes, then scan music entries for the same timing and systems concerns in another medium.
Design notes for on-device voice systems with explicit turn-taking, listener behavior, and affect-aware conversational policy.
A public index of SoundCloud sketches, modular patches, and track fragments under the johnthomas profile.
A practical direction for agentic systems: protocol-first tooling, multi-provider abstractions, and staged delivery with interface contracts.
Notes on using Astro, structured content, GitHub Actions, and SSH deployment as a small personal publishing system.
SECTION MAP
Start with the routes that have enough public material right now: notes and music.
PLATFORMS / CONTACT
One compact surface for collaboration, technical context, sound sketches, and public work.
Preferred contact surface
For collaboration, project alignment, and technical direction requests, lead with GitHub or LinkedIn notes and include timeline, scope, and constraints.