API Docs Center

Text-to-Speech API


Last Updated: 2026-05-11

Speech Synthesis

Service Overview

  • Converts input text into natural-sounding speech audio using an AI speech engine.

Service Access

The Speech Synthesis API uses a console-based project access model. You can register and activate services on the ViiTor AI official website (https://www.viitor.com/), then create a project in the console to obtain required authentication credentials.

The business service endpoint is provided by the project gateway. This document uses the following gateway as an example:

https://video-translation.ilivedata.com


Integration

Parameter Specification

  • Request URL: https://video-translation.ilivedata.com/speechSynthesis/textToSpeech
  • Method: POST
  • Content-Type: application/json
  • Response Format: {code, message, data}

HTTP Headers

HeaderRequiredTypeDescription
Content-TypeYesStringapplication/json;charset=UTF-8
AcceptYesStringapplication/json;charset=UTF-8
AuthorizationYesStringLogin token (Bearer Token)
X-User-IdYesLongUser ID. The server overrides userId in request body.
X-ChannelNoIntegerChannel code. Defaults to 0 if missing or invalid.
X-App-SourceNoStringSource marker. Effective only when X-Channel != 100.

Notes:

  • The server uses X-User-Id from headers as the source of truth.

Request Method: POST

Request Body

Field Definitions

FieldTypeRequiredDescription
sourceTextStringYesInput text to synthesize. Max length is 5000 chars; extra content will be truncated.
targetLanguageStringNoTarget language. Must be within supported language list if provided.
voiceNameStringConditionally requiredVoice name. At least one of voiceName or timbreNumber is required.
commonIntegerNoVoice scope marker (public/private).
speedFactorFloatNoSpeech rate, range [0.5, 2], default 1.0.
volumeFloatNoVolume, range [-60, 20], default 0.
emotionIntegerNoEmotion parameter, range [0, 6], default 0.
selectedEngineIntegerNoPreferred synthesis engine.
formatStringNoOutput format. Supported: pcm/wav/mp3. Default: wav.

Validation Rules

  1. sourceText is required.
  2. If targetLanguage is provided, it must be in supported language list.
  3. Invalid format values will fall back to wav.

Response Body

Standard Response Schema

FieldTypeDescription
codeInteger0 means success; non-zero means failure
messageStringResponse message
dataObjectBusiness payload

data Fields on Success

FieldTypeDescription
taskIdStringTask ID
targetAudioUrlStringSynthesized audio URL
sourceLanguageStringDetected source language
targetLanguageStringTarget language
speechesArraySegmented speech details (if available)

Examples

cURL Request

curl -X POST "https://video-translation.ilivedata.com/speechSynthesis/textToSpeech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <TOKEN>" \
  -H "X-User-Id: 123456" \
  -H "X-Channel: 100" \
  -d '{
    "sourceText": "Hello, this is a speech synthesis test.",
    "targetLanguage": "en",
    "voiceName": "clone",
    "speedFactor": 1.0,
    "volume": 0,
    "emotion": 0,
    "format": "mp3"
  }'

Success Response Example

{
  "code": 0,
  "message": "OK",
  "data": {
    "userId": 123456,
    "taskId": "ViiTor_AI_XXXXXXXXXXXXXXXXXXXXXXXX",
    "targetAudioUrl": "https://cdn.example.com/tts/result.mp3",
    "sourceLanguage": "en",
    "targetLanguage": "en"
  }
}

Failure Response Example (Invalid Parameters)

{
  "code": 2001,
  "message": "Invalid Parameter",
  "data": null
}

Common Error Codes

CodeMeaningTypical Cause
2000Missing ParameterMissing required fields (e.g., sourceText)
2001Invalid ParameterInvalid speed/volume/emotion/language/voice params
10091insufficient pointsUser has insufficient points
2107voice synthesis failedInternal synthesis failure

Client Integration Recommendations

  1. Perform client-side pre-validation for sourceText, voiceName, and format.
  2. Handle all code != 0 responses uniformly, especially parameter errors and insufficient points.

© 2026 HighRas Limited