Lee Boonstra
Raise your voice!
#1about 1 minute
Building a custom voice AI with WebRTC and Google APIs
An overview of the architecture for streaming voice from a browser to a backend for processing with conversational AI.
#2about 4 minutes
Comparing custom voice AI to public assistants
A custom voice AI provides more control over technical requirements and terms of service compared to public platforms like Google Assistant or Alexa.
#3about 1 minute
Handling short versus long user utterances
Public assistants are optimized for short commands, whereas custom AI for use cases like contact centers must be designed to handle long, complex user stories.
#4about 3 minutes
Demo of a voice-enabled self-service kiosk
A demonstration of a web-based airport kiosk that answers user questions spoken in different languages using a custom voice AI.
#5about 1 minute
The core challenge of integrating voice technologies
The main difficulty in building a voice AI is not using individual APIs, but integrating the entire pipeline from frontend audio stream to backend processing.
#6about 3 minutes
Capturing cross-browser microphone audio with RecordRTC
The RecordRTC library is used to abstract away browser inconsistencies and reliably capture microphone audio streams for processing.
#7about 2 minutes
Streaming audio to the backend with Socket.IO
Socket.IO and the socket.io-stream module enable real-time, bidirectional streaming of binary audio data from the browser to a Node.js backend.
#8about 3 minutes
Transcribing audio with the Speech-to-Text API
Google's Speech-to-Text API converts the incoming audio stream into text using a streaming recognition call that handles data as it arrives.
#9about 4 minutes
Understanding user intent with Dialogflow
Dialogflow uses natural language understanding to match transcribed user text to predefined intents, entities, and knowledge bases to determine the user's goal.
#10about 4 minutes
Adding multi-language support with the Translate API
The Translate API enables multi-language support by translating foreign language input to English for Dialogflow processing and then translating the response back.
#11about 3 minutes
Generating audio responses with Text-to-Speech
The Text-to-Speech API synthesizes a natural-sounding voice from the text response, which is then sent back to the browser as an audio buffer to be played.
#12about 1 minute
Deployment considerations and open source code
Deploying a voice application requires HTTPS for microphone access, which can be easily configured using services like App Engine Flex, and the full project code is available on GitHub.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
18:51 MIN
Developing the bot's technical and conversational framework
Design as an algorithm, not as a feature
06:14 MIN
Demonstrating basic API capabilities with live examples
Building a Browser-Based Karaoke Game with Web Speech API
34:11 MIN
Learning to communicate effectively with AI for better results
Recruiting in 2025: Will AI Help or Take Over?
27:54 MIN
Using custom-trained AI to enhance developer documentation
Exploring AI: Opportunities and Risks in Development
38:46 MIN
Live demo of building a chat with your data app
Inside the AI Revolution: How Microsoft is Empowering the World to Achieve More
38:07 MIN
Exploring the future of AI beyond simple code generation
Innovating Developer Tools with AI: Insights from GitHub Next
57:41 MIN
Adding conversational intelligence with OpenAI and streaming
WeAreDevelopers LIVE – Real-Time Phone Agents, Unsafe VPNs & More
22:43 MIN
Using AI for a conversational developer experience
Platform Engineering untold truths: is just an infrastructure matter?
Featured Partners
Related Videos
Creating bots with Dialogflow CX
Xavier Portilla Edo
Minimal infrastructure for Real‑Time Phone Agents: transcripts in, responses out
Chris Heilmann, Daniel Cranney, Marius Obert & Staff Developer Evangelist at Twilio
WeAreDevelopers LIVE – Real-Time Phone Agents, Unsafe VPNs & More
Chris Heilmann, Daniel Cranney & Marius Obert
WeAreDevelopers LIVE – AI vs the Web & AI in Browsers
Chris Heilmann, Daniel Cranney & Raymond Camden
OpenAI for FinTech: Building a Stock Market Advisor Chatbot
Akmal Chaudhri
From Syntax to Singularity: AI’s Impact on Developer Roles
Anna Fritsch-Weninger
Inside the AI Revolution: How Microsoft is Empowering the World to Achieve More
Simi Olabisi
From ML to LLM: On-device AI in the Browser
Nico Martin
From learning to earning
Jobs that call for the skills explored in this talk.


NodeJS Software Engineer - Conversational AI
MANGO
Palau-solità i Plegamans, Spain
API
Azure
Redis
Node.js
Salesforce
+6


Developer Conversational AI Specialist
Municipality of Madrid, Spain
€40-60K
API
GIT
Node.js
Google BigQuery






Senior Software Engineer, AI Model serving (Europe)
Speechify
Municipality of Madrid, Spain
Remote
Azure
Python
Docker
Kubernetes
+1


Senior Software Engineer, AI Model serving (Europe)
Speechify
Municipality of Madrid, Spain
Azure
Python
Docker
Kubernetes
Amazon Web Services (AWS)




Senior Software Engineer, AI Model serving - Valencia, Spain
Speechify
Municipality of Valencia, Spain
Azure
Python
Docker
Kubernetes
Amazon Web Services (AWS)




