tooling

Streaming

Streaming means the language model's response is delivered token by token as soon as it's generated — the user sees the beginning while the model is still writing.

In detail

Instead of waiting 8 seconds for the whole block, the user sees the first words after 200ms and starts reading. That makes AI responses feel much faster, even if time-to-last-token stays the same.

We stream by default in widget, REST and SDK. The only exception: when the response depends on a tool call still in flight.

Related terms

MCP
Model Context Protocol is an open standard from Anthropic that defines how AI agents talk to tools, data sources, and APIs — like USB for AI.
AI agent
An AI agent is a program built on a language model that completes tasks on its own: it understands a request, plans steps, calls tools, and responds with a result instead of just text.
Context window
The context window is the maximum amount of text (measured in tokens) a language model can process at once — typically 128k to 1M tokens with current models.

Streaming — explained · Agent Hub