Interface Call

NavTalk supports multiple methods for generating digital human videos, categorized into three main types: image-driven, video-driven, and system character-driven. Below are the 9 supported calling methods and usage instructions.

API Request Examples

① video + audio URL

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "video_url": "https://example.com/video.mp4",
       "audio_url": "https://example.com/audio.mp3"
     }'

② video + audio Base64

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "video_url": "https://example.com/video.mp4",
       "audio_base64": "base64-audio-data"
     }'

③ video + text（TTS）

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "video_url": "https://example.com/video.mp4",
       "content": "Welcome to NavTalk.",
       "voice": "nova"
     }'

④ image + text（TTS）

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "image_url": "https://example.com/photo.jpg",
       "content": "Welcome to NavTalk.",
       "voice": "echo"
     }'

⑤ image + audio URL

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "image_url": "https://example.com/photo.jpg",
       "audio_url": "https://example.com/audio.mp3"
     }'

⑥ image + audio Base64

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "image_url": "https://example.com/photo.jpg",
       "audio_base64": "base64-audio-data"
     }'

⑦ system role + audio URL

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "character_name": "girl2",
       "audio_url": "https://example.com/audio.mp3"
     }'

⑧ system role + audio Base64

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "character_name": "girl2",
       "audio_base64": "base64-audio-data"
     }'

⑨ system role + text（TTS）

curl -X POST "https://app.navtalk.ai/generate" \
 -H "Content-Type: application/json" \
 -d '{
       "license": "sk-xxx",
       "character_name": "girl2",
       "content": "Welcome to NavTalk.",
       "voice": "fable"
     }'

Response Handling

All synthesis APIs process requests asynchronously. After submission, the system returns a task_id for querying the final video URL.

Step 1：Submit Task

Successful Response:

{
  "status": "started",
  "task_id": "14cb760f-05ac-4fd3-a82c-e841f2f005d0"
}

Step 2: Query Task Status

Use the returned task_id to check the processing result:

curl -X GET "https://api.navtalk.ai/query_status?license=YOUR_LICENSE&task_id=14cb760f-05ac-4fd3-a82c-e841f2f005d0"

Successful response:

{
  "status": "done",
  "video_url": "https://easyaistorageaccount.blob.core.windows.net/easyai/uploadFiles/2025/05/09/xxx.mp4"
}

📌 Status Explanation:

status value

meaning

started

The task has been created and is currently being processed.

processing

In video composition

done

The successful completion, and the results are available for download.

failed

The synthesis failed. You can retry or check the error message.

Recommendations

Keep tasks under 30 seconds for faster processing.
Use clear, front-facing images/videos for optimal results.
Advanced parameters (e.g., face cropping, mouth openness) are available for fine-tuning but are optional. Defaults work well for most cases.

Advanced Face Control Parameters

NavTalk supports optional parameters inherited from MuseTalk for fine-tuning face cropping, mouth openness, and blending. These controls are advanced and should be used only when needed.

Parameter

Default

Allowed / Recommended Range

Description

bbox_shift

0

Typically [-9, 9] based on runtime adjustment experiments

Vertical movement of the face crop box. Positive values shift the crop downward (making the mouth more open), while negative values shift upward (making the mouth less open). Start with 0 and test within the suggested range.

extra_margin

10

Recommended trial range: [0, 50] (no official bounds provided)

Pixels of extra margin added around the face crop. Increases buffer area to prevent clipping of chin, hair, or jaw.

parsing_mode

"jaw"

Currently supports "jaw"or "raw"

Defines how facial regions—especially around the jawline—are parsed and blended.

left_cheek_width

90

Suggested trial range: [50, 150] (no official bounds)

Pixel width for blending region on the left cheek. Adjust wider to soften seam visibility.

right_cheek_width

90

Suggested trial range: [50, 150] (no official bounds)

Pixel width for blending region on the right cheek. Functions the same as left_cheek_width.

API Call Overview

Number

Scene

Use Video

Use Audio

Use System Role

Use TTS

①

Video + Audio URL

✅

✅ URL

❌

②

Video + Audio Base64

✅

✅ Base64

❌

③

Video + Text（TTS）

✅

❌

✅

④

Image + Text（TTS）

✅（image_url）

❌

✅

⑤

Image + Audio URL

✅（image_url）

✅ URL

❌

⑥

Image + Audio Base64

✅（image_url）

✅ Base64

❌

⑦

System Role + Audio URL

❌

✅ URL

✅

❌

⑧

System Role + Audio Base64

❌

✅ Base64

✅

❌

⑨

System Role + Text（TTS）

❌

✅

General Parameter Description

Parameter name

Type

Explanation

license

string

API authorization key (requested from the console)

video_url

string

The video link should support public MP4/MOV formats

image_url

string

Image URL (Driving Static Digital Human)

audio_url

string

The audio address (MP3/WAV)

audio_base64

string

Local audio base64 encoding

content

string

Text content for synthesized speech

voice

string

Voice style (see below)

character_name

string

The built-in system roles (such as girl2)

Support for voice style

name

style

alloy

Neutral Authority

echo

Casual and friendly

fable

Warm Narrative

onyx

Profound drama

nova

High energy and enthusiasm

shimmer

Dreamy and light-hearted

Usage Tips

These parameters are optional — if not provided, defaults are applied automatically.
For most standard use cases, leaving them at default values produces good results.
Adjust them only if you observe issues like the face crop being too tight/loose, or visible seams along the cheeks.

PreviousReal-time Digital Human Interaction NextSystem Character

Last updated 4 months ago

hashtagAPI Request Examples

hashtag① video + audio URL

hashtag② video + audio Base64

hashtag③ video + text（TTS）

hashtag④ image + text（TTS）

hashtag⑤ image + audio URL

hashtag⑥ image + audio Base64

hashtag⑦ system role + audio URL

hashtag⑧ system role + audio Base64

hashtag⑨ system role + text（TTS）

hashtagResponse Handling

hashtagStep 1：Submit Task

hashtagStep 2: Query Task Status

hashtagAdvanced Face Control Parameters

hashtagAPI Call Overview

hashtagGeneral Parameter Description

hashtagSupport for voice style

API Request Examples

① video + audio URL

② video + audio Base64

③ video + text（TTS）

④ image + text（TTS）

⑤ image + audio URL

⑥ image + audio Base64

⑦ system role + audio URL

⑧ system role + audio Base64

⑨ system role + text（TTS）

Response Handling

Step 1：Submit Task

Step 2: Query Task Status

Advanced Face Control Parameters

API Call Overview

General Parameter Description

Support for voice style