This article describes a specific canonical architecture for chat agent sessions — the core layer we, at Orchestrator Studios, build to get performance from chat agent systems beyond what naive orchestration delivers.
The point of orchestration is to overcome the shortcomings and limitations of a native LLM. The throughline of the work — what ties the architecture below together — is promoting, managing, and ensuring coherence over the inputs the LLM sees on every turn.
What we mean specifically is four commitments:
- Externalized state — session state lives outside the LLM, indexed against a session ID held by the orchestrator.
- State-specific context management — the history and context fragments the LLM sees on a given turn are shaped to the session's current state, not just appended turn after turn.
- State-specific instructions — the system prompt is selected from a curated library, surfacing only the instructions relevant to the current state.
- State-specific tools — the tool list given to the LLM is the subset that makes sense for the current state, not the full inventory.
The rest of this article describes the structure that holds these four together. Four levels of nesting, a small number of moving parts, and one architectural decision that shapes everything: the LLM only sees what the orchestrator chooses to show it — never the orchestrator's full state, never its full library of instructions, never its full toolkit.
The Hierarchy
A session is the full lifespan of a user's interaction with the agent. It is identified by a session ID, and all accumulated state hangs off that ID. A session contains one or more turns. Each turn contains one or more iterations. The final iteration of every turn produces a final answer; all earlier iterations within that turn are tool calls.
What a Turn Receives
Every turn begins when the user sends a new instruction. The orchestration layer receives three things on the way in:
- The session ID — the handle the orchestrator uses to reach session state. All prior exchanges, tool results, artifacts, decisions, and scratchpad are indexed against this ID, so passing it in is what gives the orchestrator access to everything the session has accumulated. Without the ID, the orchestrator has no path to the state.
- The new instruction — the user's message for this turn.
- Fresh context — anything that has changed in the environment since the last turn. The user may have switched screens, toggled a setting, selected a different document, or otherwise altered the surrounding UI state. This delta is passed alongside the instruction, not buried inside it.
What the Orchestrator Assembles
Once the turn is in flight, the orchestrator uses the session ID to look up the session state, and combines that state with the fresh context to compose the turn's working materials:
- A custom system prompt — assembled once per turn from a curated library of instructions, surfacing only those relevant to the current context. It is not a static template; it is a turn-specific composition conditioned on where the session is and what just changed.
- A tool list — the subset of available tools that makes sense given the current session state and context. Tool availability is dynamic across the session.
- The conversation history — the relevant prior exchange, drawn from session state and shaped to fit the turn.
These three artifacts — system prompt, tools, history — plus the new user instruction define the LLM's working context for this turn.
The LLM Sees a State-Specific Projection
This is the crucial design point. The LLM is never handed the orchestrator's raw materials — neither the full session state nor the orchestrator's full library of capabilities. What reaches the model on any given turn is a deliberately shaped projection, selected for the session's current state.
That curation runs along three axes.
State-Specific Context
The session state is the durable record — prior exchanges, tool results, artifacts, decisions, scratchpad. The orchestrator decides which slice of that record matters for the current state and shapes it accordingly. A long tool result from turn three might appear verbatim in turn four, summarized in turn seven, and not at all in turn twelve. The full state is always there, indexed against the session ID. What reaches the LLM is whatever projection fits this turn.
State-Specific Instructions
The orchestrator holds a library of curated instructions — guidance, rules, role definitions, workflow steps, response formats, edge-case handling — ready to deploy. The system prompt the LLM sees on any given turn is a small, deliberate selection from that library, chosen to fit the current state: where the session is in its workflow, what mode is active, what just changed.
We do not stuff every instruction we could possibly give into every prompt. Most of the library stays held back. The instructions that surface are the ones relevant to this state — and picking the right moment to deploy each one is itself a design discipline. Irrelevant instructions become noise. Instructions that conflict with each other become bugs that surface as confused or inconsistent model behavior. A curated, well-timed prompt outperforms a maximalist one — every time.
State-Specific Tools
The same logic applies to tools. The orchestrator has a full inventory of tools the agent could call. The list handed to the LLM on a given turn is the subset appropriate to the current state. Tools that don't belong in this state aren't shown to the model. Reducing the tool list isn't a limitation — it's how you keep the model from reaching for the wrong tool in the wrong moment, or from spending iterations debating between options that shouldn't all be on the table.
Why It Works
Curation is where the LLM's degrees of freedom are intentionally reduced. An LLM given everything will use everything, often in ways that drift from what the situation calls for. By narrowing what the model sees to what the situation actually requires, the orchestrator constrains the space of possible behaviors and makes the desired outcome the path of least resistance. The system prompt tells the model who it is for this turn. The tool list tells it what it can do right now. The curated history tells it what matters from the past. Everything outside that frame is held back by the orchestrator — available to inform the composition of the turn, but not exposed to the model directly.
The Iteration Loop
With the turn's materials assembled, the orchestrator enters an iteration loop. Each iteration is a single call to the LLM, which returns one of two things:
- A tool choice — the orchestrator executes the tool, appends the call and its result to the working context, and loops back.
- A final answer — the loop exits and the turn completes.
A turn may resolve in a single iteration (the LLM answers immediately) or take many (the LLM calls several tools first). The defining property of the final iteration is that it produces a final answer, not a tool call.
The Final Answer Has Two Components
The final answer is not a single thing. It has two components:
- A user-facing message — the response shown back to the user.
- A state output — a structured artifact written into session state, available to future turns through the orchestrator's curation.
Both are produced on every turn. They have different consumers, different formats, and often different content. The user-facing message is a presentation concern — what should the human see? The state output is a design decision — what does this turn need to record so that the next one starts well?
The split matters. Treating "the answer" as just the text returned to the user collapses two distinct outputs into one and makes the system harder to reason about. Designing them as one is what makes systems brittle.
Turn Output Flows into Session State
Every turn can generate output that is written back to session state — not just the state output of the final answer, but intermediate artifacts as well: tool results, derived facts, summaries, decisions, scratchpad reasoning the agent wants to preserve. All of it is captured and indexed against the session ID.
Subsequent turns then have access to this accumulated output, but again only through the orchestrator's curation. When preparing the LLM's working context for a later turn, the orchestrator decides what prior output to surface, in whole or in part, and in what form. A long tool result from turn three might appear verbatim in turn four, summarized in turn seven, and not at all in turn twelve — that decision sits with the orchestrator, conditioned on what the new turn needs. The session state is the durable record. What reaches the LLM on any given turn is a deliberately shaped projection of it.
Session State as the Through-Line
Session state accumulates from the first turn to the last. Every turn reads from it (to assemble system prompt, tools, and history) and writes back to it (the new exchange, tool results, intermediate artifacts, the final answer's state output, and any state changes the agent makes on the user's behalf). State is not refreshed per turn — it grows. This is what makes a session a session rather than a sequence of independent requests: the agent's understanding of the user, the task, and the environment compounds across turns, while the LLM itself only ever sees the slice the orchestrator chooses to show it.
The Design Lever
The LLM is doing what LLMs do — completing text, calling tools, producing answers. The behavior of the agent as a whole is determined by what the orchestrator chooses to put in front of the model on each turn, what it captures from each turn's output, and how it shapes that captured state into the next turn's view.
Build a careful orchestrator and you get a coherent agent. Skip that work and you get an unpredictable one — even with the best model behind it. The model is not the lever. The orchestrator is.