# Real-time Digital Human Interaction

NavTalk provides real-time digital human capabilities through WebSocket + WebRTC integration, supporting speech recognition, function calling, and synchronized video lip-sync. Below is the complete implementation process with key code examples.

## **Step 1: Establish a WebSocket Real-time Voice Connection**

Create a WebSocket connection using license key and character selection:

```js
const license = "YOUR_LICENSE_KEY";
const characterName = "navtalk.Leo";

const socket = new WebSocket(
  `wss://api.navtalk.ai/api/realtime-api?license=${encodeURIComponent(license)}&characterName=${characterName}`
);
socket.binaryType = 'arraybuffer';

// Connection event handlers
socket.onopen = () => console.log("WebSocket connection established");
socket.onmessage = (event) => {
  if (typeof event.data === 'string') {
    handleJSONMessage(JSON.parse(event.data));
  } else {
    handleAudioStream(event.data);
  }
};
```

## **Step 2: Configure Session Parameters**

Send configuration after receiving `session.created` event:

```javascript
const sessionConfig = {
  type: "session.update",
  session: {
    instructions: "You are a friendly digital assistant",
    voice: "alloy",
    temperature: 0.7,
    max_response_output_tokens: 1024,
    modalities: ["text", "audio"],
    input_audio_format: "pcm16",
    output_audio_format: "pcm16",
    input_audio_transcription: { model: "whisper-1" },
    tools: [...]  // Optional: function calling configuration
  }
};
socket.send(JSON.stringify(sessionConfig));
```

## **Step 3:** Capture and Send Audio Stream

Real-time audio capture and transmission:

```javascript
navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
  const audioContext = new AudioContext({ sampleRate: 24000 });
  const source = audioContext.createMediaStreamSource(stream);
  const processor = audioContext.createScriptProcessor(8192, 1, 1);

  processor.onaudioprocess = (event) => {
    const input = event.inputBuffer.getChannelData(0);
    const buffer = floatTo16BitPCM(input);
    const base64Audio = base64EncodeAudio(new Uint8Array(buffer));

    // Send audio chunks
    const chunkSize = 4096;
    for (let i = 0; i < base64Audio.length; i += chunkSize) {
      const chunk = base64Audio.slice(i, i + chunkSize);
      socket.send(JSON.stringify({ type: "input_audio_buffer.append", audio: chunk }));
    }
  };

  source.connect(processor);
  processor.connect(audioContext.destination);
});

```

## **Step 4: Handle AI Response Events**

Process various response types from server:

```javascript
function handleReceivedMessage(data) {
  switch (data.type) {
    case "session.created":
      sendSessionUpdate();
      break;
    case "session.updated":
      startRecording();
      break;
    case "response.audio_transcript.delta":
      console.log("AI says:", data.delta.text);
      break;
    case "response.audio.delta":
      // Play data.delta audio content
      break;
    case "response.function_call_arguments.done":
      handleFunctionCall(data);
      break;
  }
}

```

| Event Type                              | Explanation                                                                                    |
| --------------------------------------- | ---------------------------------------------------------------------------------------------- |
| `session.created`                       | The conversation was successfully created, and the configuration needs to be sent immediately. |
| `session.updated`                       | You can start sending the audio                                                                |
| `response.audio_transcript.delta`       | Real-time return of voice recognition text                                                     |
| `response.audio.delta`                  | Return AI audio data (play)                                                                    |
| `response.function_call_arguments.done` | Trigger a function call                                                                        |
| `response.audio.done`                   | The response session has concluded                                                             |

## **Step 5:** Establish WebRTC Video Connection

Real-time digital human video display:

**HTML Setup:**

```html
<video id="avatar-video" autoplay muted playsinline 
       style="width: 320px; height: 400px; object-fit: cover;">
</video>
```

**WebRTC Connection:**

```javascript
const peerConnection = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

// Handle incoming video stream
peerConnection.ontrack = (event) => {
  document.getElementById('avatar-video').srcObject = event.streams[0];
};

// Signaling channel
const signalingSocket = new WebSocket(`wss://api.navtalk.ai/iwebrtc?userId=${license}`);
signalingSocket.onmessage = async (event) => {
  const message = JSON.parse(event.data);
  switch (message.type) {
    case "offer":
      await peerConnection.setRemoteDescription(message.sdp);
      const answer = await peerConnection.createAnswer();
      await peerConnection.setLocalDescription(answer);
      signalingSocket.send(JSON.stringify({
        type: "answer",
        targetSessionId: message.targetSessionId,
        sdp: answer
      }));
      break;
    case "iceCandidate":
      await peerConnection.addIceCandidate(message.candidate);
      break;
  }
};
```

## **Complete Example**&#x20;

We recommend referring to the official sample project for quick validation: <https://github.com/navtalk/Sample>

**The example includes complete functionality:**

* **Audio Capture & Processing** - Real-time microphone input and audio stream handling&#x20;
* **Digital Human Configuration** - Character appearance, voice, and behavior settings&#x20;
* **Real-time Video Rendering** - WebRTC-based video streaming with lip synchronization&#x20;
* **Function Call Integration** - Custom tool execution and API interactions&#x20;
* **Conversation History** - Session recording and historical dialogue management&#x20;
