AI March 22, 2026 5 min read

OpenAI Codex-Spark: Coding at 1,000 Tokens Per Second Is Here

OpenAI and Cerebras just launched Codex-Spark. At 1,000 tokens per second, AI pair programming finally feels like real-time collaboration.

Your IDE is finally faster than your brain

Imagine hitting ‘Enter’ and watching a 300-line Python script materialize before you can even take a sip of coffee.

That’s the reality of Codex-Spark.

For years, we’ve tolerated the “typing” animation of LLMs—that slow, rhythmic crawl of text that reminds us we’re waiting on a server somewhere. OpenAI just killed the wait. By partnering with Cerebras Systems, they’ve pushed inference speeds to 1,000 tokens per second.

This isn’t just a marginal improvement. It’s a shift in how you interact with a machine. When the AI responds at the speed of thought, the friction between your idea and the code disappears.

What Happened

OpenAI quietly integrated Codex-Spark into the ChatGPT Pro ecosystem, specifically targeting developer workflows. This wasn’t a flashy keynote reveal; it was a performance nuke dropped into the developer community.

The Speed: While standard GPT-4o hovers around 60-80 tokens per second, Spark hits 1,000 TPS. That is a 12x to 15x increase in velocity.
The Hardware: This feat is powered by Cerebras CS-3 clusters. Unlike traditional GPUs, these “Wafer-Scale Engines” are designed specifically for the massive data throughput required for high-speed inference.
The Rollout: It is currently appearing as a toggle for ChatGPT Plus and Team users under the “Alpha Features” or “Beta” settings, with full API access expected shortly.
The Latency: Time-to-first-token has dropped to sub-10ms levels, making the UI feel like a local text editor rather than a cloud service.

Why This Matters

Speed is a feature, but extreme speed is a new paradigm.

Before Spark, using AI for coding was an asynchronous task. You’d prompt, wait, read, and then integrate. It was a conversation with a very smart, very slow intern.

Now, it’s a live mirror.

Think about refactoring a massive legacy codebase. Normally, you’d dread the “context window lag” or the time it takes for the model to rewrite a 500-line file. With Codex-Spark, that rewrite happens in half a second.

This matters because it keeps you in the flow state. Every second you spend watching a cursor blink is a second your brain spends wandering toward a Twitter tab. Spark keeps you locked in.

How It Works

The secret sauce is the Cerebras hardware. Traditional H100 clusters are great, but they face bottlenecks when moving data between chips. Cerebras puts everything on a single giant wafer, allowing for bandwidth that makes standard networking look like a dial-up modem.

Here is how you can test the throughput via a standard Python implementation once the API endpoint hits your dashboard:

import openai
import time

client = openai.OpenAI(api_key="your_key")

# The new spark-preview model identifier
start_time = time.time()
response = client.chat.completions.create(
    model="codex-spark-preview",
    messages=[{"role": "user", "content": "Write a full-stack FastAPI app with CRUD for a library system."}]
)
end_time = time.time()

token_count = response.usage.completion_tokens
print(f"Generated {token_count} tokens in {round(end_time - start_time, 2)} seconds.")
print(f"Speed: {round(token_count / (end_time - start_time), 2)} tokens/sec")

In early benchmarks shared on the OpenAI Developer Community, users are seeing full boilerplate generation—stuff that used to take 30 seconds—finishing in under 2 seconds.

What to Do Next

Check your settings: Open ChatGPT, go to Settings > Beta Features, and look for “Spark Speed” or “High-Throughput Mode.” Toggle it on immediately.
Stress test your prompts: Don’t just ask for snippets. Give it a complex task—like converting a React class component to Functional Hooks with full TypeScript definitions—and watch it fly.
Monitor your API credits: Speed is addictive, but remember that 1,000 tokens per second can burn through a budget quickly if you aren’t careful with your loops. Use it for the heavy lifting, not the small talk.