Back to Add-ons

Cerebras

Bundled

by TypeWhisper

LLM macOS
Cerebras settings

About

Cerebras delivers ultra-fast LLM inference using their custom Wafer-Scale Engine (WSE) hardware. The platform achieves speeds of around 3,000 tokens per second for large models like GPT-OSS 120B, making it one of the fastest cloud inference providers available. The plugin uses an OpenAI-compatible API and dynamically fetches available models.

Features

  • Ultra-fast inference (~3,000 tokens/sec for GPT-OSS 120B)
  • Dynamic model list with refresh from the Cerebras API
  • OpenAI-compatible API
  • API keys stored securely in the macOS Keychain

LLM Models

ModelID
Llama 3.1 8Bllama3.1-8b
GPT-OSS 120Bgpt-oss-120b
Qwen 3 235Bqwen-3-235b-a22b-instruct-2507
ZAI GLM 4.7zai-glm-4.7

Use the Refresh button to load the latest models from the Cerebras API.

Configuration

  • API Key - Sign up at cloud.cerebras.ai and generate an API key. Your key is stored securely in the macOS Keychain.

Setup

  1. Open TypeWhisper Settings > Plugins
  2. Find the Cerebras plugin and click Configure
  3. Enter your Cerebras API key
  4. Select an LLM model and choose Cerebras as your LLM provider