← Back to glossarytooling
Streaming
Streaming means the language model's response is delivered token by token as soon as it's generated — the user sees the beginning while the model is still writing.
In detail
Instead of waiting 8 seconds for the whole block, the user sees the first words after 200ms and starts reading. That makes AI responses feel much faster, even if time-to-last-token stays the same.
We stream by default in widget, REST and SDK. The only exception: when the response depends on a tool call still in flight.
Related terms
- MCPModel Context Protocol is an open standard from Anthropic that defines how AI agents talk to tools, data sources, and APIs — like USB for AI.
- AI agentAn AI agent is a program built on a language model that completes tasks on its own: it understands a request, plans steps, calls tools, and responds with a result instead of just text.
- Context windowThe context window is the maximum amount of text (measured in tokens) a language model can process at once — typically 128k to 1M tokens with current models.