
Cartesia Sonic
Cartesia Sonic 3 is a streaming text-to-speech (TTS) model that converts written text into ultra-realistic, expressive speech with exceptionally low latency, typically starting audio in about 90 milliseconds. It acts as the "vocal cords" for AI agents to enable fluid, natural voice conversations with human-like emotion, tone, and even laughter.
Pricing
Free
$/mo
- Get introduced to ultra-low latency voice AI through core models and your own voice agent
- 20K credits for models
- 1 prepaid for agents
- Personal use
- Discord support
Pro
$/mo
- Upgrade for instant voice cloning and to try voice AI in production for commercial use
- 100K credits for models
- 5 prepaid for agents
- Instant voice cloning
- Commercial Use
Startup
$/mo
- For teams starting to use voice AI in production and need shared API keys, pro voice cloning, and multiple agents
- 1.25M credits for models
- 49 prepaid for agents
- Pro voice cloning
- Organizations
Scale
$/mo
- For businesses with large-scale use cases requiring high concurrencies and multiple agents
- 8M credits for models
- 299 prepaid for agents
- Priority support
- High concurrency limits
Enterprise
$/mo
- Custom supported models and agents with mission-critical guarantees for uptime, security, and compliance
- Custom usage pricing
- Custom concurrency
- Priority and Enterprise support via Slack
- Enterprise-grade security & compliance
- Single Sign-On (SSO)
- PCI compliance
- Custom SLAs
- Custom Security Review
- HIPAA compliance
Details
Pricing Tier
FreemiumCategories
Developer ToolsData Analysis
Target Audience
General PublicStartups
Sponsor
Ad space