Back to Add-ons

Cartesia

Bundled

by TypeWhisper

TranscriptionText-to-Speech macOS
Cartesia settings

About

Cartesia adds a macOS cloud provider for speech-to-text and text-to-speech. The TypeWhisper plugin uses Cartesia’s batch Speech-to-Text API with ink-whisper and the Text-to-Speech Bytes API with sonic-3.5.

TypeWhisper does not include a shared Cartesia credential. You provide your own Cartesia API key, TypeWhisper stores it through plugin secret storage, and audio or text is sent directly to Cartesia when you choose this provider.

Sources: Cartesia API conventions, Cartesia batch Speech-to-Text, Cartesia Text-to-Speech Bytes, Cartesia plugin source, TypeWhisper plugin registry

Features

  • Cloud speech-to-text with Cartesia ink-whisper
  • Explicit spoken-language selection for the source audio
  • Word timestamps returned as TypeWhisper transcription segments
  • API key validation in plugin settings
  • Secure API key storage through TypeWhisper plugin secrets
  • Text-to-speech playback with Cartesia sonic-3.5
  • Cartesia voice refresh plus default and custom voice ID selection
  • No TypeWhisper translation support

Requirements & Privacy

RequirementDetailsWhy it matters
PlatformmacOS 14.0 or newer.The current TypeWhisper integration is a macOS plugin.
TypeWhisper host1.4.0 or newer.This is the minHostVersion in the plugin manifest.
SDK compatibilityv1.The release is published through the compatible plugin registry path.
CredentialA Cartesia API key from your Cartesia account.The plugin sends it as Authorization: Bearer <api_key> on Cartesia API requests.
OAuth scopesNone.This plugin uses a user-supplied API key, not Cartesia OAuth.
Network useAudio is uploaded to api.cartesia.ai for STT, and text is sent to api.cartesia.ai for TTS.This is a cloud provider, not local transcription or local speech synthesis.
Credential ownershipUser supplied only.TypeWhisper does not proxy requests through a TypeWhisper-owned Cartesia account.

Sources: Cartesia API conventions, Cartesia API keys

Get a Cartesia API Key

  1. Create or sign in to a Cartesia account.
  2. Open the Cartesia API keys page at play.cartesia.ai/keys.
  3. Create a standard API key for TypeWhisper.
  4. Copy the key and keep it private. Rotate it in Cartesia if it is lost or exposed.
  5. In TypeWhisper, open Settings > Plugins > Cartesia and paste the API key.
  6. Wait for the plugin to validate the key before selecting Cartesia for dictation or text-to-speech.

Cartesia also has admin API keys for management endpoints. TypeWhisper does not need an admin API key for transcription or text-to-speech.

Sources: Cartesia API conventions, Cartesia API keys, Cartesia admin API key notes

Speech-to-Text

The plugin sends WAV audio to Cartesia’s batch /stt endpoint with model=ink-whisper, requests word timestamps, and includes a language field for the source audio language.

Cartesia documents language as the input audio language in ISO-639-1 format. TypeWhisper normalizes profile, workflow, and plugin language selections to Cartesia’s primary language code, so de-DE becomes de and en-US becomes en.

TypeWhisper inputCartesia request
Profile or request language set to GermanSends language=de.
Profile or request language set to EnglishSends language=en.
Language hints available, but no exact profile languageUses the first supported language hint.
No supported profile or hint languageUses the Cartesia plugin’s Spoken Language setting.
No valid configured languageDefaults to English and sends language=en.

Cartesia does not advertise TypeWhisper’s translation capability. If an older local setting still contains an English translation flag, the plugin ignores it and continues sending the resolved spoken-language code.

Sources: Cartesia batch Speech-to-Text, Cartesia plugin source

Supported Recognition Languages

The plugin exposes Cartesia’s ink-whisper language list in TypeWhisper, including English (en), German (de), Russian (ru), Spanish (es), French (fr), Japanese (ja), Portuguese (pt), Ukrainian (uk), Chinese (zh), and many more.

Use an explicit spoken language when you know the audio language. It avoids relying on provider defaults and makes multilingual profiles more predictable.

Sources: Cartesia batch Speech-to-Text

Short and Long Audio Behavior

Cartesia’s batch STT endpoint accepts audio files directly and documents that long files do not need to be manually split; their service chunks long audio server-side. The current TypeWhisper plugin therefore uses the same batch REST request for both short and longer recordings.

RecordingTypeWhisper behaviorCartesia flow
Short clipsSends one multipart /stt request.Upload WAV audio, model=ink-whisper, selected language, word timestamps.
Longer clipsSends the same multipart /stt request.Cartesia handles chunking on the server side.

There is no separate async upload, task polling, or result download flow in the Cartesia plugin.

Sources: Cartesia batch Speech-to-Text

Text-to-Speech

The plugin uses Cartesia’s /tts/bytes endpoint with sonic-3.5 and requests raw pcm_s16le audio at 44.1 kHz for playback.

SettingWhat it controls
Text-to-Speech VoiceSelects a voice from Cartesia’s voice list after the plugin refreshes available voices.
RefreshFetches up to 100 Cartesia voices available to your account and stores the fetched list locally.
Custom Voice IDLets you paste a voice ID manually when the voice is not in the fetched list yet.
LanguageTypeWhisper passes a TTS language when the request or selected voice provides one that Cartesia supports.

Sources: Cartesia Text-to-Speech Bytes, Cartesia Sonic 3.5

Configuration

  • API Key - Paste a standard Cartesia API key. TypeWhisper validates it against Cartesia’s voices endpoint and stores it through plugin secret storage.
  • Remove - Clears the stored Cartesia API key from TypeWhisper.
  • Spoken Language - Select the source audio language for STT.
  • Text-to-Speech Voice - Pick a fetched Cartesia voice or use the built-in default voice.
  • Refresh - Fetches the current Cartesia voice list available to your account.
  • Custom Voice ID - Uses a manually pasted Cartesia voice ID for TTS.

Setup

  1. Create a Cartesia account and generate a standard API key.
  2. Open TypeWhisper Settings > Plugins.
  3. Find Cartesia and click Configure.
  4. Paste the API key and wait for validation.
  5. Choose the Spoken Language that matches your source audio.
  6. Click Refresh if you want TypeWhisper to fetch the voices available to your Cartesia account.
  7. Select a Text-to-Speech Voice or paste a Custom Voice ID.
  8. Select Cartesia as your transcription or text-to-speech provider in Settings or in a profile.

Notes

  • The plugin is cloud-only and sends audio or text to Cartesia; use a local engine when content must stay on device.
  • Cartesia rate limits, billing, and available voices are controlled by your Cartesia account.
  • The public 1.0.0 release requires TypeWhisper host version 1.4.0 or newer, SDK compatibility v1, and macOS 14.0 or newer.