Cartesia - TypeWhisper Add-ons

About

Cartesia adds a macOS cloud provider for speech-to-text and text-to-speech. The TypeWhisper plugin uses Cartesia’s batch Speech-to-Text API with ink-whisper and the Text-to-Speech Bytes API with sonic-3.5.

TypeWhisper does not include a shared Cartesia credential. You provide your own Cartesia API key, TypeWhisper stores it through plugin secret storage, and audio or text is sent directly to Cartesia when you choose this provider.

Sources: Cartesia API conventions, Cartesia batch Speech-to-Text, Cartesia Text-to-Speech Bytes, Cartesia plugin source, TypeWhisper plugin registry

Features

Cloud speech-to-text with Cartesia ink-whisper
Explicit spoken-language selection for the source audio
Word timestamps returned as TypeWhisper transcription segments
API key validation in plugin settings
Secure API key storage through TypeWhisper plugin secrets
Text-to-speech playback with Cartesia sonic-3.5
Cartesia voice refresh plus default and custom voice ID selection
No TypeWhisper translation support

Requirements & Privacy

Requirement	Details	Why it matters
Platform	macOS `14.0` or newer.	The current TypeWhisper integration is a macOS plugin.
TypeWhisper host	`1.4.0` or newer.	This is the `minHostVersion` in the plugin manifest.
SDK compatibility	`v1`.	The release is published through the compatible plugin registry path.
Credential	A Cartesia API key from your Cartesia account.	The plugin sends it as `Authorization: Bearer <api_key>` on Cartesia API requests.
OAuth scopes	None.	This plugin uses a user-supplied API key, not Cartesia OAuth.
Network use	Audio is uploaded to `api.cartesia.ai` for STT, and text is sent to `api.cartesia.ai` for TTS.	This is a cloud provider, not local transcription or local speech synthesis.
Credential ownership	User supplied only.	TypeWhisper does not proxy requests through a TypeWhisper-owned Cartesia account.

Sources: Cartesia API conventions, Cartesia API keys

Get a Cartesia API Key

Create or sign in to a Cartesia account.
Open the Cartesia API keys page at play.cartesia.ai/keys.
Create a standard API key for TypeWhisper.
Copy the key and keep it private. Rotate it in Cartesia if it is lost or exposed.
In TypeWhisper, open Settings > Plugins > Cartesia and paste the API key.
Wait for the plugin to validate the key before selecting Cartesia for dictation or text-to-speech.

Cartesia also has admin API keys for management endpoints. TypeWhisper does not need an admin API key for transcription or text-to-speech.

Sources: Cartesia API conventions, Cartesia API keys, Cartesia admin API key notes

Speech-to-Text

The plugin sends WAV audio to Cartesia’s batch /stt endpoint with model=ink-whisper, requests word timestamps, and includes a language field for the source audio language.

Cartesia documents language as the input audio language in ISO-639-1 format. TypeWhisper normalizes profile, workflow, and plugin language selections to Cartesia’s primary language code, so de-DE becomes de and en-US becomes en.

TypeWhisper input	Cartesia request
Profile or request language set to German	Sends `language=de`.
Profile or request language set to English	Sends `language=en`.
Language hints available, but no exact profile language	Uses the first supported language hint.
No supported profile or hint language	Uses the Cartesia plugin’s Spoken Language setting.
No valid configured language	Defaults to English and sends `language=en`.

Cartesia does not advertise TypeWhisper’s translation capability. If an older local setting still contains an English translation flag, the plugin ignores it and continues sending the resolved spoken-language code.

Sources: Cartesia batch Speech-to-Text, Cartesia plugin source

Supported Recognition Languages

The plugin exposes Cartesia’s ink-whisper language list in TypeWhisper, including English (en), German (de), Russian (ru), Spanish (es), French (fr), Japanese (ja), Portuguese (pt), Ukrainian (uk), Chinese (zh), and many more.

Use an explicit spoken language when you know the audio language. It avoids relying on provider defaults and makes multilingual profiles more predictable.

Sources: Cartesia batch Speech-to-Text

Short and Long Audio Behavior

Cartesia’s batch STT endpoint accepts audio files directly and documents that long files do not need to be manually split; their service chunks long audio server-side. The current TypeWhisper plugin therefore uses the same batch REST request for both short and longer recordings.

Recording	TypeWhisper behavior	Cartesia flow
Short clips	Sends one multipart `/stt` request.	Upload WAV audio, `model=ink-whisper`, selected `language`, word timestamps.
Longer clips	Sends the same multipart `/stt` request.	Cartesia handles chunking on the server side.

There is no separate async upload, task polling, or result download flow in the Cartesia plugin.

Sources: Cartesia batch Speech-to-Text

Text-to-Speech

The plugin uses Cartesia’s /tts/bytes endpoint with sonic-3.5 and requests raw pcm_s16le audio at 44.1 kHz for playback.

Setting	What it controls
Text-to-Speech Voice	Selects a voice from Cartesia’s voice list after the plugin refreshes available voices.
Refresh	Fetches up to 100 Cartesia voices available to your account and stores the fetched list locally.
Custom Voice ID	Lets you paste a voice ID manually when the voice is not in the fetched list yet.
Language	TypeWhisper passes a TTS language when the request or selected voice provides one that Cartesia supports.

Sources: Cartesia Text-to-Speech Bytes, Cartesia Sonic 3.5

Configuration

API Key - Paste a standard Cartesia API key. TypeWhisper validates it against Cartesia’s voices endpoint and stores it through plugin secret storage.
Remove - Clears the stored Cartesia API key from TypeWhisper.
Spoken Language - Select the source audio language for STT.
Text-to-Speech Voice - Pick a fetched Cartesia voice or use the built-in default voice.
Refresh - Fetches the current Cartesia voice list available to your account.
Custom Voice ID - Uses a manually pasted Cartesia voice ID for TTS.

Setup

Create a Cartesia account and generate a standard API key.
Open TypeWhisper Settings > Plugins.
Find Cartesia and click Configure.
Paste the API key and wait for validation.
Choose the Spoken Language that matches your source audio.
Click Refresh if you want TypeWhisper to fetch the voices available to your Cartesia account.
Select a Text-to-Speech Voice or paste a Custom Voice ID.
Select Cartesia as your transcription or text-to-speech provider in Settings or in a profile.

Notes

The plugin is cloud-only and sends audio or text to Cartesia; use a local engine when content must stay on device.
Cartesia rate limits, billing, and available voices are controlled by your Cartesia account.
The public 1.0.0 release requires TypeWhisper host version 1.4.0 or newer, SDK compatibility v1, and macOS 14.0 or newer.