Cartesia
Bundledby TypeWhisper
About
Cartesia adds a macOS cloud provider for speech-to-text and text-to-speech. The TypeWhisper plugin uses Cartesia’s batch Speech-to-Text API with ink-whisper and the Text-to-Speech Bytes API with sonic-3.5.
TypeWhisper does not include a shared Cartesia credential. You provide your own Cartesia API key, TypeWhisper stores it through plugin secret storage, and audio or text is sent directly to Cartesia when you choose this provider.
Sources: Cartesia API conventions, Cartesia batch Speech-to-Text, Cartesia Text-to-Speech Bytes, Cartesia plugin source, TypeWhisper plugin registry
Features
- Cloud speech-to-text with Cartesia
ink-whisper - Explicit spoken-language selection for the source audio
- Word timestamps returned as TypeWhisper transcription segments
- API key validation in plugin settings
- Secure API key storage through TypeWhisper plugin secrets
- Text-to-speech playback with Cartesia
sonic-3.5 - Cartesia voice refresh plus default and custom voice ID selection
- No TypeWhisper translation support
Requirements & Privacy
| Requirement | Details | Why it matters |
|---|---|---|
| Platform | macOS 14.0 or newer. | The current TypeWhisper integration is a macOS plugin. |
| TypeWhisper host | 1.4.0 or newer. | This is the minHostVersion in the plugin manifest. |
| SDK compatibility | v1. | The release is published through the compatible plugin registry path. |
| Credential | A Cartesia API key from your Cartesia account. | The plugin sends it as Authorization: Bearer <api_key> on Cartesia API requests. |
| OAuth scopes | None. | This plugin uses a user-supplied API key, not Cartesia OAuth. |
| Network use | Audio is uploaded to api.cartesia.ai for STT, and text is sent to api.cartesia.ai for TTS. | This is a cloud provider, not local transcription or local speech synthesis. |
| Credential ownership | User supplied only. | TypeWhisper does not proxy requests through a TypeWhisper-owned Cartesia account. |
Sources: Cartesia API conventions, Cartesia API keys
Get a Cartesia API Key
- Create or sign in to a Cartesia account.
- Open the Cartesia API keys page at
play.cartesia.ai/keys. - Create a standard API key for TypeWhisper.
- Copy the key and keep it private. Rotate it in Cartesia if it is lost or exposed.
- In TypeWhisper, open Settings > Plugins > Cartesia and paste the API key.
- Wait for the plugin to validate the key before selecting Cartesia for dictation or text-to-speech.
Cartesia also has admin API keys for management endpoints. TypeWhisper does not need an admin API key for transcription or text-to-speech.
Sources: Cartesia API conventions, Cartesia API keys, Cartesia admin API key notes
Speech-to-Text
The plugin sends WAV audio to Cartesia’s batch /stt endpoint with model=ink-whisper, requests word timestamps, and includes a language field for the source audio language.
Cartesia documents language as the input audio language in ISO-639-1 format. TypeWhisper normalizes profile, workflow, and plugin language selections to Cartesia’s primary language code, so de-DE becomes de and en-US becomes en.
| TypeWhisper input | Cartesia request |
|---|---|
| Profile or request language set to German | Sends language=de. |
| Profile or request language set to English | Sends language=en. |
| Language hints available, but no exact profile language | Uses the first supported language hint. |
| No supported profile or hint language | Uses the Cartesia plugin’s Spoken Language setting. |
| No valid configured language | Defaults to English and sends language=en. |
Cartesia does not advertise TypeWhisper’s translation capability. If an older local setting still contains an English translation flag, the plugin ignores it and continues sending the resolved spoken-language code.
Sources: Cartesia batch Speech-to-Text, Cartesia plugin source
Supported Recognition Languages
The plugin exposes Cartesia’s ink-whisper language list in TypeWhisper, including English (en), German (de), Russian (ru), Spanish (es), French (fr), Japanese (ja), Portuguese (pt), Ukrainian (uk), Chinese (zh), and many more.
Use an explicit spoken language when you know the audio language. It avoids relying on provider defaults and makes multilingual profiles more predictable.
Sources: Cartesia batch Speech-to-Text
Short and Long Audio Behavior
Cartesia’s batch STT endpoint accepts audio files directly and documents that long files do not need to be manually split; their service chunks long audio server-side. The current TypeWhisper plugin therefore uses the same batch REST request for both short and longer recordings.
| Recording | TypeWhisper behavior | Cartesia flow |
|---|---|---|
| Short clips | Sends one multipart /stt request. | Upload WAV audio, model=ink-whisper, selected language, word timestamps. |
| Longer clips | Sends the same multipart /stt request. | Cartesia handles chunking on the server side. |
There is no separate async upload, task polling, or result download flow in the Cartesia plugin.
Sources: Cartesia batch Speech-to-Text
Text-to-Speech
The plugin uses Cartesia’s /tts/bytes endpoint with sonic-3.5 and requests raw pcm_s16le audio at 44.1 kHz for playback.
| Setting | What it controls |
|---|---|
| Text-to-Speech Voice | Selects a voice from Cartesia’s voice list after the plugin refreshes available voices. |
| Refresh | Fetches up to 100 Cartesia voices available to your account and stores the fetched list locally. |
| Custom Voice ID | Lets you paste a voice ID manually when the voice is not in the fetched list yet. |
| Language | TypeWhisper passes a TTS language when the request or selected voice provides one that Cartesia supports. |
Sources: Cartesia Text-to-Speech Bytes, Cartesia Sonic 3.5
Configuration
- API Key - Paste a standard Cartesia API key. TypeWhisper validates it against Cartesia’s voices endpoint and stores it through plugin secret storage.
- Remove - Clears the stored Cartesia API key from TypeWhisper.
- Spoken Language - Select the source audio language for STT.
- Text-to-Speech Voice - Pick a fetched Cartesia voice or use the built-in default voice.
- Refresh - Fetches the current Cartesia voice list available to your account.
- Custom Voice ID - Uses a manually pasted Cartesia voice ID for TTS.
Setup
- Create a Cartesia account and generate a standard API key.
- Open TypeWhisper Settings > Plugins.
- Find Cartesia and click Configure.
- Paste the API key and wait for validation.
- Choose the Spoken Language that matches your source audio.
- Click Refresh if you want TypeWhisper to fetch the voices available to your Cartesia account.
- Select a Text-to-Speech Voice or paste a Custom Voice ID.
- Select Cartesia as your transcription or text-to-speech provider in Settings or in a profile.
Notes
- The plugin is cloud-only and sends audio or text to Cartesia; use a local engine when content must stay on device.
- Cartesia rate limits, billing, and available voices are controlled by your Cartesia account.
- The public
1.0.0release requires TypeWhisper host version1.4.0or newer, SDK compatibilityv1, and macOS14.0or newer.