Cerebras - TypeWhisper Add-ons

About

Cerebras delivers ultra-fast LLM inference using their custom Wafer-Scale Engine (WSE) hardware. The platform achieves speeds of around 3,000 tokens per second for large models like GPT-OSS 120B, making it one of the fastest cloud inference providers available. The plugin uses an OpenAI-compatible API and dynamically fetches available models.

Features

Ultra-fast inference (~3,000 tokens/sec for GPT-OSS 120B)
Dynamic model list with refresh from the Cerebras API
OpenAI-compatible API
API keys stored securely in the macOS Keychain

LLM Models

Model	ID
Llama 3.1 8B	`llama3.1-8b`
GPT-OSS 120B	`gpt-oss-120b`
Qwen 3 235B	`qwen-3-235b-a22b-instruct-2507`
ZAI GLM 4.7	`zai-glm-4.7`

Use the Refresh button to load the latest models from the Cerebras API.

Configuration

API Key - Sign up at cloud.cerebras.ai and generate an API key. Your key is stored securely in the macOS Keychain.

Setup

Open TypeWhisper Settings > Plugins
Find the Cerebras plugin and click Configure
Enter your Cerebras API key
Select an LLM model and choose Cerebras as your LLM provider