Speech to Text API

The VoxSigma REST API is so simple that you can integrate our speech-to-text service in your application by adding only one command-line in your application script. It can be used with command-line HTTP clients such as cURL, or with HTTP client libraries for C/C++, PHP, Java or Javascript. The API offers three main processing functions: language identification, speech-to-text conversion, and speech-text alignment.

VoxSigma API Features

Protocol : REST API over HTTPS;
POST, GET and PUT HTTP methods are accepted;
Both URI encoded requests and MIME multi-part requests are supported;
Three submission modes: file, streaming, and real-time.
Availability : Service available 24/7/365 with failover servers and geographic redundancy
Audio file format : AAC, AIFF, ASF, FLAC, MS-Wave, MPEG, Ogg/Vorbis, Nist Sphere, Sun AU
Audio type : telephone or broadcast quality, most sampling rates are supported.
Audio duration per request : up to few hours (depending on the coding rate).
Functions : language identification, audio and speaker segmentation, speech-to-text conversion, and speech-text alignment.
Output : XML data with speaker diarization, language identification tags, word transcription, punctuation, confidence measures, numerical entities and other specific entities.
Special features : on the fly language model adaptation, daily updates of language models for broadcast data
Special needs
- Batch processing offered as an online or offline service to process archives [request form]
- Model customization is offered on demand to ensure you get the best possible results for your needs [contact form]

Pricing

We offers various usage plans : pay as you go, daily plan, batch plan, ...
For our generic systems and large quantities the price is on the order of 0.01 euro (or $0.01) per minute.
Note that our pricing is based on speech duration, i.e. silences are not counted and there is no minimum cost per submission.
We offer free trials upon request.

VoxSigma API Features

Pricing

Support