Text-to-Speech API

Last Updated: 2026-05-11

Speech Synthesis

Service Overview

Converts input text into natural-sounding speech audio using an AI speech engine.

The Speech Synthesis API uses a console-based project access model. You can register and activate services on the ViiTor AI official website (https://www.viitor.com/), then create a project in the console to obtain required authentication credentials.

The business service endpoint is provided by the project gateway. This document uses the following gateway as an example:

https://video-translation.ilivedata.com

Integration

Parameter Specification

Request URL: https://video-translation.ilivedata.com/speechSynthesis/textToSpeech
Method: POST
Content-Type: application/json
Response Format: {code, message, data}

HTTP Headers

Header	Required	Type	Description
`Content-Type`	Yes	String	`application/json;charset=UTF-8`
`Accept`	Yes	String	`application/json;charset=UTF-8`
`Authorization`	Yes	String	Login token (Bearer Token)
`X-User-Id`	Yes	Long	User ID. The server overrides `userId` in request body.
`X-Channel`	No	Integer	Channel code. Defaults to `0` if missing or invalid.
`X-App-Source`	No	String	Source marker. Effective only when `X-Channel != 100`.

Notes:

The server uses X-User-Id from headers as the source of truth.

Request Method: POST

Request Body

Field Definitions

Field	Type	Required	Description
`sourceText`	String	Yes	Input text to synthesize. Max length is 5000 chars; extra content will be truncated.
`targetLanguage`	String	No	Target language. Must be within supported language list if provided.
`voiceName`	String	Conditionally required	Voice name. At least one of `voiceName` or `timbreNumber` is required.
`common`	Integer	No	Voice scope marker (public/private).
`speedFactor`	Float	No	Speech rate, range `[0.5, 2]`, default `1.0`.
`volume`	Float	No	Volume, range `[-60, 20]`, default `0`.
`emotion`	Integer	No	Emotion parameter, range `[0, 6]`, default `0`.
`selectedEngine`	Integer	No	Preferred synthesis engine.
`format`	String	No	Output format. Supported: `pcm/wav/mp3`. Default: `wav`.

Validation Rules

sourceText is required.
If targetLanguage is provided, it must be in supported language list.
Invalid format values will fall back to wav.

Response Body

Standard Response Schema

Field	Type	Description
`code`	Integer	`0` means success; non-zero means failure
`message`	String	Response message
`data`	Object	Business payload

`data` Fields on Success

Field	Type	Description
`taskId`	String	Task ID
`targetAudioUrl`	String	Synthesized audio URL
`sourceLanguage`	String	Detected source language
`targetLanguage`	String	Target language
`speeches`	Array	Segmented speech details (if available)

Examples

cURL Request

curl -X POST "https://video-translation.ilivedata.com/speechSynthesis/textToSpeech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <TOKEN>" \
  -H "X-User-Id: 123456" \
  -H "X-Channel: 100" \
  -d '{
    "sourceText": "Hello, this is a speech synthesis test.",
    "targetLanguage": "en",
    "voiceName": "clone",
    "speedFactor": 1.0,
    "volume": 0,
    "emotion": 0,
    "format": "mp3"
  }'

Success Response Example

{
  "code": 0,
  "message": "OK",
  "data": {
    "userId": 123456,
    "taskId": "ViiTor_AI_XXXXXXXXXXXXXXXXXXXXXXXX",
    "targetAudioUrl": "https://cdn.example.com/tts/result.mp3",
    "sourceLanguage": "en",
    "targetLanguage": "en"
  }
}

Failure Response Example (Invalid Parameters)

{
  "code": 2001,
  "message": "Invalid Parameter",
  "data": null
}

Common Error Codes

Code	Meaning	Typical Cause
`2000`	Missing Parameter	Missing required fields (e.g., `sourceText`)
`2001`	Invalid Parameter	Invalid speed/volume/emotion/language/voice params
`10091`	insufficient points	User has insufficient points
`2107`	voice synthesis failed	Internal synthesis failure

Client Integration Recommendations

Perform client-side pre-validation for sourceText, voiceName, and format.
Handle all code != 0 responses uniformly, especially parameter errors and insufficient points.