Zum Inhalt springen
Agent Hub
Back to glossary
tooling

Streaming

Streaming means the language model's response is delivered token by token as soon as it's generated — the user sees the beginning while the model is still writing.

In detail

Instead of waiting 8 seconds for the whole block, the user sees the first words after 200ms and starts reading. That makes AI responses feel much faster, even if time-to-last-token stays the same.

We stream by default in widget, REST and SDK. The only exception: when the response depends on a tool call still in flight.

Related terms

Streaming — explained · Agent Hub