Communicate with a real-time digital human

Example of the First Request: Interacting with a Real-time Digital Human

NavTalk offers real-time digital human capabilities based on WebSocket + WebRTC, supporting voice recognition, function calling, and video lip-syncing. Below is the complete access process and key code explanations.

Overall Process

  1. Establish a WebSocket connection (with license + characterName)

  2. Configure session parameters (voice, audio format, context, etc.)

  3. Capture audio from the browser's microphone, convert it to PCM, and send the audio stream to the server.

  4. Receive server responses (text, audio, function call).

  5. Use WebRTC to display the digital human in real-time video.

🔹 Step 1: Establish a WebSocket Real-time Voice Connection

You need to create a WebSocket connection using the license we provide and pass the characterName to select the digital human's appearance.

const license = "YOUR_LICENSE_KEY";
const characterName = "girl2";

const socket = new WebSocket(`wss://api.navtalk.ai/api/realtime-api?license=${encodeURIComponent(license)}&characterName=${characterName}`);
socket.binaryType = 'arraybuffer';

socket.onopen = () => {
  console.log("WebSocket connection established successfully.");
};

socket.onmessage = (event) => {
  if (typeof event.data === 'string') {
    const data = JSON.parse(event.data);
    handleReceivedMessage(data); // Process JSON message
  } else if (event.data instanceof ArrayBuffer) {
    handleReceivedBinaryMessage(event.data); // Process audio stream
  }
};

🔹 Step 2: Configure Session Parameters (Initialization)

After session.created is returned, send session.update to configure the AI's behavior style, language model, audio parameters, transcription method, etc.

Extensible: The tools field supports custom function calling capabilities.

🔹 Step 3: Capture User Voice and Push

Access the microphone through the browser, record voice in real-time, convert it to PCM16 format, and send it to the server in base64 encoding.

🔹Step 4: Handle AI Response Events

The platform will return multiple events, mainly including:

Event Type
Explanation

session.created

The conversation was successfully created, and the configuration needs to be sent immediately.

session.updated

You can start sending the audio

response.audio_transcript.delta

Real-time return of voice recognition text

response.audio.delta

Return AI audio data (play)

response.function_call_arguments.done

Trigger a function call

response.audio.done

The response session has concluded

Example:

🔹 Step 5: Establish WebRTC Video Stream Connection (Display Digital Human)

WebRTC is the medium for the real-time expressiveness of the digital human (lip movement, facial expressions, gaze, etc.), so ensure that you create the WebRTC video channel while establishing the WebSocket real-time voice connection.

You will need:

  • An HTML <video> tag bound to the video.

  • A specific license (i.e., userId).

  • A target sessionId (which must be associated with the real-time WebSocket session).

1️⃣ Bind the Video Element

Reserve a <video> tag in your HTML to display the digital human's appearance:

Ensure that autoplay and playsinline are key attributes for mobile browsers to display video.

Then bind the element in JavaScript:

2️⃣ Establish WebRTC Signaling Connection

Create a WebSocket signaling channel using your license:

3️⃣ Receive Offer / Answer / ICE Candidates

The server will return in sequence:

  • Offer (SDP request)

  • Answer (SDP response)

  • ICE Candidate (network hole punching address)

Handle these messages as follows:

4️⃣ Create RTCPeerConnection and Play Video

5️⃣ Receive ICE Reverse Channel

Common Issues and Debugging Suggestions

Question
Suggestions

There is no voice returned

Please check if the session.update was sent and whether the audio format is correct

The video does not display

Check whether the WebRTC connection is successful and whether the video DOM has been bound

AI is not responding

Check whether the audio stream was successfully sent and if there are any issues with the audio stream format

ICE failed

Check the network environment and try changing the STUN server

Complete Example Project

We recommend using the official DEMO project to quickly verify if the connection is successful: https://github.com/navtalk/Samplearrow-up-right. The example supports recording, character selection, video rendering, function calls, and the entire process.

Last updated