LangGraph Advanced: Part 2 β€” Multi-Agent Supervisor Architecture

Author Photo
LangGraph Advanced: Part 2 β€” Multi-Agent Supervisor Architecture

🀝 1. Why Multi-Agent Architectures?

By the end of the LangGraph Basics series, you could build a stateful chatbot with tool use, human-in-the-loop approval, and cross-session memory. Those are powerful ingredients β€” but they all share one underlying assumption: a single LLM node handles everything. The user asks a question, the agent thinks, and the agent responds. For narrow use cases that works perfectly. For real-world applications, it starts to break down.

Consider a wellness app. A user might ask: "I want to build lean muscle β€” what should I eat and how should I train?" A single generalist agent will attempt to answer both the nutrition and the fitness question in one go. The response will likely be acceptable, but shallow in both areas because the model is spreading its reasoning across two domains simultaneously. If you had a dedicated Nutrition Coach and a dedicated Fitness Coach β€” each fine-tuned on their domain through carefully crafted system prompts β€” they would each provide better, deeper, more contextually relevant advice.

That is the core insight behind multi-agent architectures: instead of one generalist handling everything, you compose a team of specialists. Each specialist focuses on what it knows best. A coordinator β€” the supervisor β€” decides who talks to the user and when.

1.1

🧱 The Single-Agent Bottleneck

A single LLM node becomes a bottleneck in three common scenarios:

🎭

Domain Depth

A single system prompt can't make an agent simultaneously expert in nutrition science, exercise physiology, and mindfulness research. Specialists carry deeper, more focused knowledge.

πŸ”€

Context Pollution

When one agent handles everything, unrelated conversation history piles up and can confuse the model. Specialists each maintain a cleaner, domain-focused context.

πŸ”§

Maintainability

Tuning a single monolithic prompt for all domains is brittle β€” changing one domain's instructions can break another. Each specialist has its own isolated prompt file.

πŸ“ˆ

Scalability

Adding a new domain means adding a new specialist node and updating the supervisor prompt β€” no changes to existing specialists. Monolithic agents require full rewrites.

1.2

🎯 The Supervisor Pattern

The supervisor pattern is the canonical multi-agent architecture in LangGraph. It has two types of components:

  • Supervisor node β€” an LLM node that reads the full conversation history and decides which specialist to call next, or whether the question has been fully answered. The supervisor never adds a message visible to the user; it only controls routing.
  • Specialist nodes β€” LLM nodes, each with a focused system prompt. When called, a specialist reads the full conversation, generates a domain-specific response, appends it to the message list, and hands control back to the supervisor.

The critical structural difference from basic conditional routing (covered in Basics Part 3) is the feedback loop. In conditional routing, the flow is: START β†’ supervisor β†’ specialist β†’ END. In the supervisor pattern, every specialist routes back to the supervisor: START β†’ supervisor β†’ specialist β†’ supervisor β†’ ... β†’ END. The supervisor re-reads the updated conversation β€” including the specialist's response β€” and decides the next step. This means a single user query can trigger multiple specialists in sequence.

πŸ“Œ Key distinction: Conditional routing (blog_10) is rule-based and always takes exactly one hop. The supervisor pattern is LLM-driven and can take one or more hops depending on the complexity of the query.
1.3

🧘 Our Scenario: AI Personal Wellness Coach

Throughout this post we'll build Coach Alex β€” an AI Personal Wellness Coach. Users can ask Coach Alex about nutrition, fitness, mindfulness, or any combination. Here's why this scenario genuinely requires multi-agent architecture:

  • A question like "What should I eat to lose weight?" needs a focused nutrition answer β€” the fitness agent shouldn't interfere.
  • A question like "I want to build lean muscle β€” what should I eat and how should I train?" needs both a nutrition response and a fitness response, in sequence, each building on the other.
  • A question like "I can't sleep because of work stress" needs a mindfulness response β€” neither nutrition nor fitness is relevant.

No rule-based router can handle all three scenarios correctly β€” it can't know from a single classification that the second query needs two specialists. The supervisor LLM reads the conversation after each agent responds and decides whether to call another specialist or stop. That dynamic reasoning is exactly what makes multi-agent necessary here.


βš™οΈ 2. Installation & Setup

This post uses Python 3.12. Verify your version before creating the virtual environment:

python --version # Python 3.12.x

Create and activate a virtual environment named langgraph:

python -m venv langgraph # macOS / Linux source langgraph/bin/activate # Windows langgraph\Scripts\activate

All posts in the LangGraph series share a single requirements.txt at the repo root. Install everything with:

langchain==1.2.17 langgraph==1.1.10 langchain-google-genai==4.2.2 python-dotenv==1.2.2 gradio==6.14.0
pip install -r requirements.txt

Get a free Gemini API key at Google AI Studio. Create a .env file inside the langgraph/ root:

GOOGLE_API_KEY=your_google_api_key_here GEMINI_MODEL_NAME=gemini-3-flash-preview GEMINI_TEMPERATURE=0.7 GEMINI_MAX_RETRIES=2
⚠️ Never commit .env to version control. Add it to .gitignore immediately.

Here is the complete project tree for this post:

advanced-2-multi-agent-supervisor/ β”œβ”€β”€ config.py # Config class β€” reads .env β”œβ”€β”€ llm.py # GeminiLLM wrapper β”œβ”€β”€ state.py # WellnessState TypedDict β†’ Section 4 β”œβ”€β”€ nodes.py # WellnessNodes β€” supervisor + 3 specialists β†’ Sections 5–6 β”œβ”€β”€ graph.py # WellnessGraph β€” supervisor loop, MemorySaver β†’ Section 7 β”œβ”€β”€ wellness_runner.py # Console entry point with 4-demo run β†’ Section 8 β”œβ”€β”€ app.py # Gradio Blocks with New Session button β†’ Section 9 β”œβ”€β”€ prompts/ # LLM prompt templates, one file per agent β”‚ β”œβ”€β”€ supervisor.txt # Supervisor routing instructions β”‚ β”œβ”€β”€ nutrition.txt # Nutrition Coach persona + response rules β”‚ β”œβ”€β”€ fitness.txt # Fitness Coach persona + response rules β”‚ └── mindfulness.txt # Mindfulness Coach persona + response rules β”œβ”€β”€ figure/ # Auto-generated graph diagrams (git-ignored) β”‚ β”œβ”€β”€ graph.mmd # Mermaid source β”‚ └── graph.png # PNG export └── .env # API keys β€” never commit

config.py and llm.py are covered in Section 2.1. state.py in Section 4. nodes.py in Sections 5 and 6. graph.py in Section 7. The runner and app are walked through in Sections 8 and 9.

2.1

πŸ”§ Configuring the LLM

config.py reads all settings from the .env file, making it trivial to tune the model name or temperature without touching any logic files:

import os from dotenv import load_dotenv load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), "..", ".env")) class Config: MODEL_NAME = os.getenv("GEMINI_MODEL_NAME", "gemini-3-flash-preview") TEMPERATURE = float(os.getenv("GEMINI_TEMPERATURE", 0.7)) MAX_RETRIES = int(os.getenv("GEMINI_MAX_RETRIES", 2))

llm.py wraps Google Gemini in a thin class. Every node that needs an LLM calls GeminiLLM().get_llm():

from langchain_google_genai import ChatGoogleGenerativeAI from config import Config class GeminiLLM: def __init__(self): self.llm = ChatGoogleGenerativeAI( model=Config.MODEL_NAME, temperature=Config.TEMPERATURE, max_retries=Config.MAX_RETRIES, ) def get_llm(self): return self.llm

🧠 3. Understanding the Supervisor Pattern

Before writing any code, it's worth understanding exactly what makes the supervisor pattern different from the conditional routing you already know from Basics Part 3.

3.1

πŸ” LLM-Based vs Rule-Based Routing

In Basics Part 3, the Customer Support Router used a classification node that invoked the LLM to label the message as billing, technical, or general. Then a separate Python function read that label and returned the correct node name. Once the specialist handled the message, the graph went directly to END. The entire flow was a straight line β€” always exactly one hop.

The supervisor pattern is fundamentally different. The routing decision is made by an LLM with structured output that reads the entire conversation history β€” including messages added by specialists in previous hops. After a specialist responds, control returns to the supervisor node, which re-reads the conversation and asks itself: "Has the question been fully answered? Should I call another specialist?" Only the LLM can make that contextual judgment. No Python function can.

Aspect Conditional Routing (blog_10) Supervisor Pattern (this post)
Routing decision Rule-based Python function LLM with structured output
Hops per query Always exactly 1 1 or more, dynamically decided
After specialist responds Goes to END Returns to supervisor
Multi-specialist queries Not possible Supervisor calls both in sequence
Reads previous agent output No Yes β€” full conversation history
3.2

πŸ“‹ Structured Output for Routing

The supervisor must always return a valid routing decision β€” it can't say "I think maybe nutrition?" or write a sentence explaining its reasoning. We enforce this with structured output: a Pydantic BaseModel that constrains the LLM's response to a fixed schema.

LangChain's with_structured_output() method wraps any chat model and forces it to return a typed object instead of free text. Under the hood, it uses the model's native tool-calling or JSON mode to guarantee schema compliance:

from typing import Literal from pydantic import BaseModel class SupervisorDecision(BaseModel): next_agent: Literal["nutrition", "fitness", "mindfulness", "FINISH"] reasoning: str supervisor_llm = llm.with_structured_output(SupervisorDecision)

The Literal type in next_agent is critical. It tells the LLM that the only valid values are the four listed strings. The LLM cannot hallucinate a node name like "sleep_agent" or "DONE". The conditional router in graph.py can then use a simple dictionary mapping without any error-handling for unexpected values.

βœ… Always use Literal for routing decisions. It prevents the LLM from inventing node names and makes the routing dictionary in your graph exhaustive by construction.
3.3

πŸ”„ The Supervisor Loop

The loop is created by wiring the graph so that every specialist node has an outgoing edge back to the supervisor. Here is what the execution trace looks like for a multi-domain query:

πŸ‘€
User Query
"I want to build lean muscle β€” what should I eat and how should I train?"
🎯
Supervisor (Turn 1)
Reads query. Decides: both nutrition and fitness are needed. Routes to nutrition first. Sets next_agent = "nutrition".
πŸ₯¦
Nutrition Agent
Reads full conversation (user query only so far). Generates a diet plan for muscle building. Appends AI message to state. Returns to supervisor.
🎯
Supervisor (Turn 2)
Reads updated conversation (user query + nutrition response). Sees fitness hasn't responded yet. Routes to fitness. Sets next_agent = "fitness".
πŸ’ͺ
Fitness Agent
Reads full conversation including nutrition response. Generates a training plan that complements the diet advice. Appends AI message to state. Returns to supervisor.
🎯
Supervisor (Turn 3)
Reads updated conversation (user query + nutrition response + fitness response). All domains covered. Sets next_agent = "FINISH". Graph ends.
βœ…
User Sees Both Responses
The nutrition and fitness AI messages are combined and returned β€” a comprehensive, multi-specialist answer to one query.

πŸ—‚οΈ 4. State Design

WellnessState has exactly two fields β€” one for the conversation and one for routing:

from typing import Annotated from langchain_core.messages import BaseMessage from langgraph.graph.message import add_messages from typing_extensions import TypedDict class WellnessState(TypedDict): # add_messages appends new messages instead of overwriting the list. messages: Annotated[list[BaseMessage], add_messages] # Set by the supervisor node to control which specialist to call next. # "FINISH" signals the conditional router to route to END. next_agent: str

The messages field uses the add_messages reducer from LangGraph (introduced in Basics Part 2), so every new message is appended rather than overwriting the list. All nodes β€” supervisor and specialists β€” read from this same list. This is how the supervisor can see what specialists have already said: their responses are ordinary AIMessage objects inside messages.

The next_agent field is a plain string with last-write-wins semantics (the default for fields without a reducer). The supervisor node is the only node that writes to it. Specialist nodes never touch it. The conditional router reads it to determine which edge to follow.

πŸ”— Design principle: Keep routing state separate from conversation state. messages is the shared blackboard that every agent reads and writes to; next_agent is a control signal that only the supervisor writes. Mixing them would make the graph logic harder to reason about.

We define WellnessState as our own TypedDict rather than using LangGraph's MessagesState shortcut, because we need the additional next_agent field for routing control. MessagesState only provides messages.


🎯 5. Building the Supervisor Agent

The supervisor is the heart of the multi-agent system. It has three components: a Pydantic schema that constrains its output, a node function that invokes the LLM, and a system prompt that describes the team and the routing rules.

5.1

πŸ“ The SupervisorDecision Schema

We define a Pydantic model with two fields. next_agent uses Literal to enumerate every valid destination, including "FINISH". The reasoning field encourages the LLM to reason explicitly before committing to a choice β€” this acts as a lightweight chain-of-thought that tends to improve routing accuracy:

from typing import Literal from pydantic import BaseModel class SupervisorDecision(BaseModel): next_agent: Literal["nutrition", "fitness", "mindfulness", "FINISH"] reasoning: str

When the structured-output LLM processes a message, it always returns a SupervisorDecision instance. decision.next_agent is guaranteed to be one of the four valid strings. The conditional router can therefore use a simple dict mapping with no fallback needed.

5.2

πŸ€– The Supervisor Node

The supervisor node is deliberately minimal. It prepends the system prompt, invokes the structured LLM, and returns only the routing decision β€” it never adds a user-visible message:

def supervisor_node(self, state: WellnessState) -> dict: """Read the full conversation and decide which specialist to call next.""" messages = [SystemMessage(content=self.supervisor_prompt)] + list(state["messages"]) decision = self.supervisor_llm.invoke(messages) return {"next_agent": decision.next_agent}

Notice that the return dictionary only updates next_agent. It does not include a messages key, so the message list is untouched. The specialist nodes are the only ones that append to messages. This keeps the supervisor transparent to the user β€” they only see specialist responses.

πŸ’‘ The supervisor reads the full state["messages"], which includes both the original user query and every specialist response added so far. After the nutrition agent responds, the supervisor sees the nutrition message in the list and uses that context when deciding whether to call the fitness agent next.
5.3

πŸ“ The Supervisor Prompt

The supervisor prompt lives in prompts/supervisor.txt. It introduces the team, maps each specialist name to a domain, and gives concrete routing rules:

You are Coach Alex, a wellness supervisor who coordinates a team of specialist coaches. Your team: - nutrition β€” Expert in diet, food, macronutrients, meal planning, and nutritional science - fitness β€” Expert in exercise, workouts, cardio, strength training, and physical performance - mindfulness β€” Expert in stress management, meditation, sleep hygiene, and mental wellbeing Your job is to read the current conversation and decide which specialist should respond next, or whether the question has already been fully answered. Routing rules: 1. Route to exactly one specialist at a time. 2. If a question touches multiple domains, route to the most relevant specialist first; in your next turn after that specialist has responded, route to the next specialist. 3. Once all relevant specialists have responded to the current user query, output next_agent = "FINISH". 4. Never output FINISH before at least one specialist has responded to the current user message. 5. If the conversation already contains an AI response to the user's latest message and all domains are covered, output FINISH.

Rule 4 is especially important. Without it, the supervisor might look at an empty conversation, decide the question is trivial, and immediately output FINISH before any specialist has responded β€” leaving the user with no answer at all. Rules 4 and 5 together ensure the supervisor always waits for at least one specialist and stops as soon as the query is fully covered.


πŸ‘₯ 6. Building the Specialist Agents

Each specialist is a simple LLM node that reads the full conversation history through its own domain-specific system prompt and appends its response to messages. Here is the nutrition specialist as an example:

def nutrition_node(self, state: WellnessState) -> dict: messages = [SystemMessage(content=self.nutrition_prompt)] + list(state["messages"]) response = self.llm.invoke(messages) return {"messages": [response]}

The fitness and mindfulness nodes are identical in structure β€” they differ only in which prompt they load. All three are defined inside the WellnessNodes class. The prompts are loaded in __init__ using the standard _load_prompt() helper:

_PROMPTS_DIR = os.path.join(os.path.dirname(__file__), "prompts") def _load_prompt(filename: str) -> str: with open(os.path.join(_PROMPTS_DIR, filename), "r") as f: return f.read().strip() class WellnessNodes: def __init__(self): self.llm = GeminiLLM().get_llm() self.supervisor_prompt = _load_prompt("supervisor.txt") self.nutrition_prompt = _load_prompt("nutrition.txt") self.fitness_prompt = _load_prompt("fitness.txt") self.mindfulness_prompt = _load_prompt("mindfulness.txt") # Bind structured output so the supervisor always returns a valid decision. self.supervisor_llm = self.llm.with_structured_output(SupervisorDecision)

Each specialist's prompt instructs it to begin every response with a branded prefix β€” πŸ₯¦ **Nutrition Coach:**, πŸ’ͺ **Fitness Coach:**, or 🧘 **Mindfulness Coach:**. This makes it clear in the Gradio UI which specialist is speaking, especially when multiple agents respond to one query.

Because all specialists read the full state["messages"] (which includes previous specialist responses), a fitness agent responding after a nutrition agent will naturally reference the dietary context when suggesting a training plan. The shared message list creates cross-agent coherence without any explicit message-passing code.

βœ… Prompt separation pays off here. If you later want to make the Nutrition Coach more cautious (e.g. adding a medical disclaimer), you edit only prompts/nutrition.txt. The fitness and mindfulness agents are unaffected. This is the main operational advantage of separating prompts into individual files.

πŸ”— 7. Assembling the Graph

The graph assembly is where the supervisor loop comes to life. There are two key steps: writing the route function and wiring the edges.

7.1

🧭 The Route Function

The route function reads next_agent from state and returns its value unchanged. LangGraph's add_conditional_edges() uses this return value as a key to look up the actual node name in the mapping dictionary:

def route_by_agent(state: WellnessState) -> str: """Read next_agent from state and return the node name (or END).""" return state.get("next_agent", "FINISH")

The default "FINISH" is a safety fallback for the very first graph invocation, before the supervisor has had a chance to set next_agent. In practice this edge is never taken because the graph always starts at the supervisor node (via START β†’ supervisor), not at the conditional edge.

7.2

⚑ Wiring the Supervisor Loop

The complete _build() method registers all nodes, sets up the conditional edges from the supervisor, and β€” the crucial detail β€” adds a return edge from each specialist back to the supervisor:

from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import END, START, StateGraph def _build(self): graph = StateGraph(WellnessState) graph.add_node("supervisor", self.nodes.supervisor_node) graph.add_node("nutrition", self.nodes.nutrition_node) graph.add_node("fitness", self.nodes.fitness_node) graph.add_node("mindfulness", self.nodes.mindfulness_node) # Every run starts at the supervisor. graph.add_edge(START, "supervisor") # Supervisor decides which specialist to call, or FINISH to end. graph.add_conditional_edges( "supervisor", route_by_agent, { "nutrition": "nutrition", "fitness": "fitness", "mindfulness": "mindfulness", "FINISH": END, }, ) # Each specialist returns control to the supervisor after responding. graph.add_edge("nutrition", "supervisor") graph.add_edge("fitness", "supervisor") graph.add_edge("mindfulness", "supervisor") return graph.compile(checkpointer=MemorySaver())

The three add_edge calls at the bottom are what create the loop. Without them, the graph would be identical to Basics Part 3's conditional routing β€” specialists would run once and the graph would end. With them, the supervisor gets to re-evaluate after every specialist response, enabling multi-hop routing.

MemorySaver is the in-memory checkpointer from LangGraph. It gives each thread_id its own conversation history that persists across multiple invoke() calls within the same process. Unlike SqliteSaver (used in Advanced Part 1), MemorySaver resets when the process restarts β€” it's ideal for session-scoped memory without the overhead of a database file.

πŸ“Œ Loop safety: The supervisor loop can theoretically run indefinitely if the LLM never outputs FINISH. The supervisor prompt's rules 4 and 5 prevent this in practice β€” but for production systems, consider adding a MAX_HOPS counter to WellnessState as a hard ceiling.

πŸ—οΈ 8. Complete Example: AI Personal Wellness Coach

With all the concepts in place, let's walk through the complete implementation of Coach Alex β€” an AI Personal Wellness Coach that uses the supervisor pattern to route queries to nutrition, fitness, and mindfulness specialists.

8.1

πŸ—ΊοΈ Architecture Overview

🎯

Supervisor

Reads the full conversation history and decides which specialist to call next (or FINISH), setting next_agent via structured output.

πŸ₯¦

Nutrition Agent

Answers diet, meal planning, and calorie questions. Appends its response to state, then returns to the supervisor.

πŸ’ͺ

Fitness Agent

Handles workout plans and training frequency. Reads nutrition responses already in state so its advice complements the diet plan.

🧘

Mindfulness Agent

Covers stress management, sleep hygiene, and breathing techniques β€” called when the query touches mental wellness.

8.2

πŸ“ Project Structure

advanced-2-multi-agent-supervisor/ β”œβ”€β”€ config.py # Config class β€” reads .env β”œβ”€β”€ llm.py # GeminiLLM wrapper β”œβ”€β”€ state.py # WellnessState TypedDict β”œβ”€β”€ nodes.py # WellnessNodes β€” supervisor + 3 specialist nodes β”œβ”€β”€ graph.py # WellnessGraph β€” supervisor loop, MemorySaver β”œβ”€β”€ wellness_runner.py # Console entry point with 4-demo run β”œβ”€β”€ app.py # Gradio Blocks with New Session button β”œβ”€β”€ prompts/ # LLM prompt templates β€” one file per agent β”‚ β”œβ”€β”€ supervisor.txt # Supervisor routing instructions β”‚ β”œβ”€β”€ nutrition.txt # Nutrition Coach persona + response rules β”‚ β”œβ”€β”€ fitness.txt # Fitness Coach persona + response rules β”‚ └── mindfulness.txt # Mindfulness Coach persona + response rules β”œβ”€β”€ figure/ # Auto-generated graph diagrams (git-ignored) β”‚ β”œβ”€β”€ graph.mmd # Mermaid source β”‚ └── graph.png # PNG export └── .env # API keys β€” never commit
8.3

πŸ—‚οΈ State (state.py)

from typing import Annotated from langchain_core.messages import BaseMessage from langgraph.graph.message import add_messages from typing_extensions import TypedDict class WellnessState(TypedDict): # add_messages appends new messages instead of overwriting the list. messages: Annotated[list[BaseMessage], add_messages] # Set by the supervisor node to control which specialist to call next. # "FINISH" signals the conditional router to route to END. next_agent: str
8.4

πŸ€– Nodes (nodes.py)

import os from typing import Literal from langchain_core.messages import SystemMessage from pydantic import BaseModel from llm import GeminiLLM from state import WellnessState _PROMPTS_DIR = os.path.join(os.path.dirname(__file__), "prompts") def _load_prompt(filename: str) -> str: with open(os.path.join(_PROMPTS_DIR, filename), "r") as f: return f.read().strip() def _extract_text(content) -> str: """Normalise langchain-google-genai 4.x content (list or str) to plain str.""" if isinstance(content, str): return content if isinstance(content, list): return "".join( b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text" ) return str(content) class SupervisorDecision(BaseModel): next_agent: Literal["nutrition", "fitness", "mindfulness", "FINISH"] reasoning: str class WellnessNodes: def __init__(self): self.llm = GeminiLLM().get_llm() self.supervisor_prompt = _load_prompt("supervisor.txt") self.nutrition_prompt = _load_prompt("nutrition.txt") self.fitness_prompt = _load_prompt("fitness.txt") self.mindfulness_prompt = _load_prompt("mindfulness.txt") self.supervisor_llm = self.llm.with_structured_output(SupervisorDecision) def supervisor_node(self, state: WellnessState) -> dict: messages = [SystemMessage(content=self.supervisor_prompt)] + list(state["messages"]) decision = self.supervisor_llm.invoke(messages) return {"next_agent": decision.next_agent} def nutrition_node(self, state: WellnessState) -> dict: messages = [SystemMessage(content=self.nutrition_prompt)] + list(state["messages"]) response = self.llm.invoke(messages) return {"messages": [response]} def fitness_node(self, state: WellnessState) -> dict: messages = [SystemMessage(content=self.fitness_prompt)] + list(state["messages"]) response = self.llm.invoke(messages) return {"messages": [response]} def mindfulness_node(self, state: WellnessState) -> dict: messages = [SystemMessage(content=self.mindfulness_prompt)] + list(state["messages"]) response = self.llm.invoke(messages) return {"messages": [response]}
8.5

πŸ”— Graph Assembly (graph.py)

import os from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import END, START, StateGraph from nodes import WellnessNodes from state import WellnessState FIGURE_DIR = os.path.join(os.path.dirname(__file__), "figure") def route_by_agent(state: WellnessState) -> str: return state.get("next_agent", "FINISH") class WellnessGraph: def __init__(self): self.nodes = WellnessNodes() self.compiled_graph = self._build() def _build(self): graph = StateGraph(WellnessState) graph.add_node("supervisor", self.nodes.supervisor_node) graph.add_node("nutrition", self.nodes.nutrition_node) graph.add_node("fitness", self.nodes.fitness_node) graph.add_node("mindfulness", self.nodes.mindfulness_node) graph.add_edge(START, "supervisor") graph.add_conditional_edges( "supervisor", route_by_agent, { "nutrition": "nutrition", "fitness": "fitness", "mindfulness": "mindfulness", "FINISH": END, }, ) graph.add_edge("nutrition", "supervisor") graph.add_edge("fitness", "supervisor") graph.add_edge("mindfulness", "supervisor") return graph.compile(checkpointer=MemorySaver()) def save_figure(self): os.makedirs(FIGURE_DIR, exist_ok=True) mmd_path = os.path.join(FIGURE_DIR, "graph.mmd") with open(mmd_path, "w") as f: f.write(self.compiled_graph.get_graph().draw_mermaid()) png_path = os.path.join(FIGURE_DIR, "graph.png") with open(png_path, "wb") as f: f.write(self.compiled_graph.get_graph().draw_mermaid_png()) print(f" Graph saved β†’ {mmd_path}") print(f" Graph saved β†’ {png_path}") def get_compiled_graph(self): return self.compiled_graph
8.6

▢️ Runner & Console Output (wellness_runner.py)

WellnessRunner wraps the compiled graph. Its chat() method snapshots the message count before invoking the graph, then collects every new AI message added during the run and joins them into a single string:

class WellnessRunner: def __init__(self): self.wellness_graph = WellnessGraph() self.app = self.wellness_graph.get_compiled_graph() def save_figure(self): self.wellness_graph.save_figure() def _config(self, thread_id: str) -> dict: return {"configurable": {"thread_id": thread_id}} def chat(self, message: str, thread_id: str) -> str: state_before = self.app.get_state(self._config(thread_id)) count_before = len(state_before.values.get("messages", [])) if state_before.values else 0 self.app.invoke( {"messages": [HumanMessage(content=message)]}, config=self._config(thread_id), ) state_after = self.app.get_state(self._config(thread_id)) messages_after = state_after.values.get("messages", []) new_ai = [m for m in messages_after[count_before:] if m.type == "ai"] return "\n\n".join(_extract_text(m.content) for m in new_ai) def get_history(self, thread_id: str) -> list: state = self.app.get_state(self._config(thread_id)) return state.values.get("messages", []) if state.values else []

The demo section in wellness_runner.py runs four queries across two threads to showcase every key behaviour:

Demo 1 "I've been feeling stressed lately and can't sleep well. Can you help?"
Shows Single-hop routing β€” supervisor routes to mindfulness; mindfulness responds; supervisor outputs FINISH.
Demo 2 "I want to build lean muscle. What should I eat and how should I train?"
Shows Multi-hop routing β€” supervisor routes to nutrition, then fitness. Both specialist responses are joined and returned together.
Demo 3 "Can you give me a sample 3-day meal plan based on what you recommended?"
Shows Session memory β€” same thread_id, nutrition agent references the muscle-building context from Demo 2.
Demo 4 "What are some breathing exercises I can do to calm anxiety at work?"
Shows Thread isolation β€” new thread_id, no knowledge of the previous user's conversation.

Run the demo with:

cd advanced-2-multi-agent-supervisor && python wellness_runner.py

Expected console output:

============================================================ LangGraph Advanced β€” AI Personal Wellness Coach Demo ============================================================ Saving graph architecture... Graph saved β†’ figure/graph.mmd Graph saved β†’ figure/graph.png ──────────────────────────────────────────────────────────── Demo 1: Single-agent routing (stress & sleep) ──────────────────────────────────────────────────────────── πŸ™‹ User: I've been feeling stressed lately and can't sleep well. Can you help? πŸ€– Coach Alex: 🧘 **Mindfulness Coach:** Stress and poor sleep form a feedback loop β€” addressing both together works better... ──────────────────────────────────────────────────────────── Demo 2: Multi-agent routing (nutrition + fitness) ──────────────────────────────────────────────────────────── πŸ™‹ User: I want to build lean muscle. What should I eat and how should I train? [Supervisor called 2 specialist(s) for this query] πŸ€– Coach Alex: πŸ₯¦ **Nutrition Coach:** For lean muscle building, aim for a slight caloric surplus of 200–300 kcal above maintenance... πŸ’ͺ **Fitness Coach:** Here is a 4-day training split optimised for lean muscle growth... ──────────────────────────────────────────────────────────── Demo 3: Follow-up question (session memory) ──────────────────────────────────────────────────────────── πŸ™‹ User: Can you give me a sample 3-day meal plan based on what you recommended? πŸ€– Coach Alex: πŸ₯¦ **Nutrition Coach:** Based on the muscle-building guidelines I shared, here is a sample 3-day plan... βœ… Nutrition Coach remembered the muscle-building context from turn 2. ──────────────────────────────────────────────────────────── Demo 4: Thread isolation (new user, fresh context) ──────────────────────────────────────────────────────────── πŸ™‹ Sam: What are some breathing exercises I can do to calm anxiety at work? πŸ€– Coach Alex: 🧘 **Mindfulness Coach:** Box breathing is one of the most effective techniques for workplace anxiety... βœ… Sam's session has no knowledge of the previous user's conversation. ============================================================
8.7

πŸ“Š Graph Diagram

Running runner.save_figure() at the start of the demo saves both a Mermaid source file and a PNG into the figure/ folder. The generated Mermaid diagram looks like this:

flowchart TD S([__start__]) --> supervisor(supervisor) supervisor -.->|"FINISH"| E([__end__]) supervisor -.->|"nutrition"| nutrition(nutrition) supervisor -.->|"fitness"| fitness(fitness) supervisor -.->|"mindfulness"| mindfulness(mindfulness) nutrition --> supervisor fitness --> supervisor mindfulness --> supervisor style S fill:#e8f5e9,stroke:#43a047,color:#1b5e20 style E fill:#fce4ec,stroke:#e53935,color:#b71c1c style supervisor fill:#e3f2fd,stroke:#1e88e5,color:#0d47a1 style nutrition fill:#fff3e0,stroke:#fb8c00,color:#e65100 style fitness fill:#f3e5f5,stroke:#8e24aa,color:#4a148c style mindfulness fill:#e0f2f1,stroke:#00897b,color:#004d40

Figure 2: The compiled LangGraph β€” supervisor routes to specialists via conditional edges; each specialist loops back. The graph ends when the supervisor returns FINISH.


πŸ–₯️ 9. Web Interface

The Gradio app in app.py wraps the runner in a gr.Blocks layout with a chat interface and a "New Session" button. The respond method uses yield (required by Gradio 6.x) β€” the full combined response is assembled from all specialist calls and then yielded as a single string:

import uuid import gradio as gr from wellness_runner import WellnessRunner class WellnessApp: def __init__(self): self.runner = WellnessRunner() def respond(self, message: str, _history: list, thread_id: str): if not message.strip(): yield "" return yield self.runner.chat(message, thread_id) def launch(self): with gr.Blocks(title="🧘 AI Personal Wellness Coach") as demo: thread_state = gr.State(value=str(uuid.uuid4())) chat = gr.ChatInterface( fn=self.respond, title="🧘 AI Personal Wellness Coach β€” Coach Alex", description=( "Ask Coach Alex about nutrition, fitness, or mindfulness. " "Complex questions are handled by multiple specialist coaches." ), additional_inputs=[thread_state], ) gr.ClearButton( [chat.chatbot, chat.textbox], value="πŸ”„ New Session", variant="primary", ).click( fn=lambda: str(uuid.uuid4()), outputs=[thread_state], ) demo.launch(css=".gradio-container { max-width: 860px !important; }") if __name__ == "__main__": WellnessApp().launch()

Start the web UI with:

cd advanced-2-multi-agent-supervisor && python app.py
AI Personal Wellness Coach β€” Gradio Web UI

Figure 3: The Gradio web UI β€” Coach Alex chat interface with the New Session button.

The app opens at http://127.0.0.1:7860. Each gr.State holds a UUID that acts as the thread_id. Clicking "New Session" generates a fresh UUID, giving the user an isolated conversation history without restarting the server. If the user asks a multi-domain question β€” say, "How can I reduce belly fat through diet and exercise?" β€” they'll see two specialist responses, clearly labelled, returned as one cohesive answer.

9.1

πŸ’‘ What to Try

These queries exercise the different routing paths β€” from single-specialist calls to full three-agent responses and the New Session flow:

Nutrition only "I want to lose 5 kg in two months. What should I eat daily?"
Observe Supervisor reads the query and routes only to the Nutrition Agent β€” no fitness or mindfulness call. You get one specialist response covering calorie targets, macros, and meal timing.
Fitness only "Can you build me a 3-day beginner workout plan for home, no equipment?"
Observe Supervisor routes only to the Fitness Agent. One response with a structured workout plan β€” bodyweight exercises, sets, reps, and rest days.
Two specialists "I want to build lean muscle. What should I eat and how should I train?"
Observe Supervisor routes to Nutrition first, then Fitness. The Fitness Agent reads the nutrition response already in state and aligns the training plan with the diet advice β€” two coordinated specialist replies in one answer.
All three specialists "I'm exhausted, stressed at work, and gaining weight. I need help with diet, exercise, and stress β€” where do I start?"
Observe All three agents are called in sequence. Nutrition, Fitness, and Mindfulness each contribute a section. The Mindfulness Agent sees the other two responses in state and frames its advice as a complement β€” one comprehensive multi-specialist reply.
Memory + New Session Turn 1: "I'm vegan and lactose intolerant." β€” Turn 2: "Suggest a high-protein breakfast." β€” Then click New Session and ask the same breakfast question again.
Observe Turn 2 references your dietary restrictions from Turn 1 β€” Coach Alex remembers. After clicking New Session, the conversation resets to a fresh UUID thread and the dietary context is gone β€” breakfast suggestions become generic.

βœ… 10. Conclusion

The multi-agent supervisor pattern extends everything you know about conditional routing with one important change: the routing decision is made by an LLM that reads the full conversation, and control returns to that LLM after every specialist responds. This creates a loop that can drive a query through one specialist or several, depending on what the conversation demands.

In Coach Alex, that loop lets a single user message receive deep, domain-specific advice from a Nutrition Coach and a Fitness Coach in sequence, with each specialist naturally building on what the other said. No rule-based router could achieve this without knowing ahead of time exactly how many domains each query spans.

The building blocks you've used in this post scale directly to more complex systems. Adding a fourth specialist β€” say, a Sleep Coach β€” means adding one node, one prompt file, one Literal option in SupervisorDecision, and one entry in the conditional edge mapping. The supervisor prompt needs a one-line description of the new agent's domain. Nothing else changes.

  • Structured output (with_structured_output) keeps routing deterministic β€” the LLM can never return an invalid destination.
  • The supervisor loop (specialist β†’ supervisor β†’ specialist β†’ ...) enables multi-hop routing without any explicit loop counter in your code.
  • Prompt separation (prompts/*.txt) lets you tune each specialist's behaviour without touching the others.
  • MemorySaver gives every thread_id its own isolated conversation history across multiple invocations within the same session.
  • Scalability β€” adding a new specialist requires only one new node, one prompt file, one Literal option, and one entry in the routing map. Nothing else changes.

In the next part of the LangGraph Advanced series, we'll go further and add real-world tool use to specialist agents β€” giving them the ability to call external APIs and perform computations rather than relying purely on LLM knowledge.

πŸ”— LangGraph Advanced Series β€” Part 2 of 5: This post covers the Multi-Agent Supervisor pattern. Part 1 introduced ReAct agents with tool use. Part 3 adds RAG with conditional routing. Part 4 builds a real-estate advisor with financial tools. Part 5 completes the series with MCP integration.

Technical Stacks

Technical Stacks

Python Python
LangGraph LangGraph
LangChain LangChain
Gemini Gemini
Gradio Gradio
Pydantic Pydantic
Download

Download Source Code

LangGraph Advanced Part 2 β€” AI Personal Wellness Coach

View on GitHub