LangGraph Advanced: Part 1 — Conversational Chatbot with Memory

1. Why Production Chatbots Need More Than Basic Memory

LangGraph's checkpointers (covered in Basics Part 4) solve persistence — every message is stored in a SQLite database and restored for the next turn. That solves the between-session memory problem. But there is a second, harder problem hiding inside long conversations: the context window.

Every LLM has a maximum number of tokens it can process in a single call. Pass in too many tokens and the call fails, the response gets truncated, or the cost skyrockets. For short demos this never matters. For a real chatbot that users return to day after day, it becomes the dominant engineering challenge.

1.1

The Unbounded History Problem

Each time a user sends a message, LangGraph restores the full message list from the checkpointer and passes it — plus the new message — to the LLM. After 20 turns that list contains 40 messages. After 100 turns it contains 200 messages. The token count grows without bound.

The hidden cost of naïve persistence

A naïve chatbot that stores every message costs 10× more per call after 100 turns than after 10 turns — and eventually hits a hard error when the context limit is exceeded. The state must be actively managed, not just accumulated.

This is not a LangGraph-specific limitation — it applies to any stateful LLM application. LangGraph just gives you the right hooks to solve it cleanly.

1.2

Trim vs Summarise

There are two mainstream strategies for controlling message history:

Strategy	How it works	What is lost	Best for
Trim	Delete the oldest N messages once the list exceeds a threshold	All detail in deleted messages — gone forever	Customer-support bots where old turns are truly irrelevant
Summarise	Ask the LLM to compress old messages into a short paragraph, then delete them	Verbatim wording — but key facts are preserved in the summary	Assistants that need to remember facts across many sessions (preferences, history)

For a recipe assistant that remembers dietary restrictions and favourite cuisines, trimming would quietly discard facts the user mentioned two days ago. Summarisation compresses the conversation while keeping those facts alive in the summary field. That is the pattern this post teaches.

1.3

Our Scenario: Personal Recipe Assistant

Throughout this post we build Chef Aria — a Personal Recipe Assistant that remembers your dietary preferences, favourite cuisines, and past recipe discussions across many sessions. A user might tell Chef Aria on Monday that they are vegetarian and allergic to nuts. On Thursday, Aria should still know this — even if the original Monday message has long since been compressed into the summary.

Why this scenario needs summarisation

Dietary preferences and cooking history are valuable long-term context. A trim strategy would silently delete them. Summarisation keeps the facts alive while preventing the token count from growing indefinitely — exactly what a production recipe assistant needs.

The app uses SqliteSaver for cross-session persistence and a Gradio streaming UI so responses appear token by token. By the end of this post you will have a fully working chatbot you can run locally and reuse as a template for any long-running assistant.

2. Installation & Setup

Install the required packages in a virtual environment:

pip install langgraph langchain-google-genai langchain-core gradio python-dotenv

Create a .env file in the project root with your Gemini API key and optional configuration overrides:

GEMINI_API_KEY=your_key_here GEMINI_MODEL_NAME=gemini-3-flash-preview GEMINI_TEMPERATURE=0.7 SUMMARY_THRESHOLD=6

SUMMARY_THRESHOLD=6 means the summarisation node fires after 6 messages are in state. You can raise this to allow longer free-running conversation before compressing.

2.1

Configuring the LLM

All runtime settings live in a single Config class so they are easy to tune without hunting through multiple files:

import os from dotenv import load_dotenv load_dotenv() class Config: MODEL_NAME = os.getenv("GEMINI_MODEL_NAME", "gemini-3-flash-preview") TEMPERATURE = float(os.getenv("GEMINI_TEMPERATURE", 0.7)) MAX_RETRIES = int(os.getenv("GEMINI_MAX_RETRIES", 2)) SUMMARY_THRESHOLD = int(os.getenv("SUMMARY_THRESHOLD", 6)) DB_PATH = os.path.join(os.path.dirname(__file__), "recipe_chat.db")

DB_PATH uses os.path.dirname(__file__) so the SQLite file is always created next to config.py, regardless of which directory you run the app from. The LLM wrapper reads these values:

from langchain_google_genai import ChatGoogleGenerativeAI from config import Config class GeminiLLM: def get_llm(self) -> ChatGoogleGenerativeAI: return ChatGoogleGenerativeAI( model=Config.MODEL_NAME, temperature=Config.TEMPERATURE, max_retries=Config.MAX_RETRIES, )

3. State Design for Production Chatbots

In Basics Part 2 you learned that LangGraph state is a TypedDict and that fields with an Annotated reducer accumulate values instead of overwriting them. The production chatbot state builds directly on that foundation — it just adds one new field: summary.

3.1

Adding a summary Field

The full state for the recipe assistant is deliberately minimal — just two fields:

from typing import Annotated from langchain_core.messages import BaseMessage from langgraph.graph.message import add_messages from typing_extensions import TypedDict class RecipeState(TypedDict): messages: Annotated[list[BaseMessage], add_messages] summary: str

messages — the add_messages reducer appends new messages instead of overwriting the list. It also handles RemoveMessage instructions: when the summarise node returns a list containing RemoveMessage(id=...) objects, the reducer deletes those messages by ID.
summary — a plain Python str with no special reducer (last-write-wins). The summarise node writes a new value here each time it runs. Starts as an empty string.

We define RecipeState as our own TypedDict rather than using LangGraph's MessagesState shortcut because we need the additional summary field — MessagesState only provides messages.

3.2

How messages + summary Work Together

At any given moment the state holds two complementary pieces of context:

The dual-context model

summary carries the compressed history of everything that happened before the summarisation threshold was reached. messages carries the recent turns in full detail. The chat node prepends the summary as an extra system message so the LLM sees both — it "remembers" the past without paying the token cost of storing every message verbatim.

Concretely, the chat node's prompt construction looks like this:

system_content = self.chat_prompt # Chef Aria persona if summary: system_content += f"\n\nPrevious conversation summary:\n{summary}" messages = [SystemMessage(content=system_content)] + list(state["messages"]) response = self.llm.invoke(messages)

The LLM receives: persona + summary + recent messages. The token budget is bounded because messages is periodically compressed and the summary is short (<150 words by default).

4. The Summarisation Pattern

The summarisation pattern has three interlocking pieces: a summarise node that uses the LLM to compress old messages, a router function that decides when to trigger it, and a conditional edge that wires the two together. This section walks through each piece before the complete example assembles them.

4.1

The Summarisation Node

The summarise node does three things in sequence:

Selects the old messages — everything except the two most recent turns.
Asks the LLM to compress them into a short paragraph (extending any existing summary).
Returns RemoveMessage objects for each old message so the add_messages reducer deletes them from state.

def summarize_node(self, state: RecipeState) -> dict: existing_summary = state.get("summary", "") old_messages = state["messages"][:-2] # keep the last 2 messages intact lines = [] for m in old_messages: role = "User" if m.type == "human" else "Assistant" lines.append(f"{role}: {_extract_text(m.content)}") prompt = self.summarize_prompt.format( existing_summary=existing_summary, conversation="\n".join(lines), ) response = self.llm.invoke([HumanMessage(content=prompt)]) new_summary = _extract_text(response.content) # Tell add_messages to delete the old messages by ID. delete_ops = [RemoveMessage(id=m.id) for m in old_messages] return {"summary": new_summary, "messages": delete_ops}

Why keep the last 2 messages?

The most recent user question and the assistant's answer are the active context. Summarising them before the user sees a reply would lose the immediate conversational thread. Keeping them in messages ensures the next turn has fresh context; everything older lives in the summary.

The summarise prompt (loaded from prompts/summarize.txt) uses two placeholders. If an existing summary is present it is extended rather than replaced, so the compression is incremental:

You are summarising a cooking conversation for long-term memory. {existing_summary} New conversation turns to incorporate: {conversation} Write a concise summary (≤150 words) that preserves all dietary restrictions, food preferences, recipes discussed, and cooking tips mentioned. Return only the summary text — no headings or preamble.

4.2

When to Summarise: Conditional Routing

After the chat node responds, a router function inspects the message count. If it exceeds Config.SUMMARY_THRESHOLD (default 6), execution routes to summarize; otherwise the graph ends immediately.

def should_summarize(state: RecipeState) -> str: if len(state["messages"]) > Config.SUMMARY_THRESHOLD: return "summarize" return "end"

This function is registered as a conditional edge out of the chat node. The explicit mapping dict ensures both branches appear as labelled edges in the generated graph diagram:

graph.add_conditional_edges( "chat", should_summarize, { "summarize": "summarize", "end": END, }, ) graph.add_edge("summarize", END)

After summarize runs, the graph always ends — the summarised state is checkpointed automatically. The next user message starts a new graph execution with a clean message list (just the last 2 messages) and the accumulated summary.

4.3

End-to-End Flow

Here is the full lifecycle of a single user message inside this graph:

Turn-by-turn lifecycle

Restore — checkpointer loads RecipeState for the thread_id (messages + summary).

chat node — prepends summary as system context, calls the LLM, appends the AI reply to messages.

should_summarize — counts messages; routes to summarize if > threshold, else END.

summarize node (if triggered) — compresses old messages, writes new summary, deletes old messages via RemoveMessage.

Checkpoint — updated state is persisted. Next turn starts here.

5. Complete Example: Personal Recipe Assistant

The following sections walk through every file of the project — from state to graph to runner — and then show the console output you will see when you run the demo.

5.1

Architecture Overview

The recipe assistant graph has exactly two nodes and two possible exit paths:

💬

chat node

Receives the full state (summary + recent messages), calls Chef Aria via Gemini, and appends the AI reply to messages.

📝

summarize node

Triggered conditionally. Compresses old messages into a ≤150-word summary and deletes the originals with RemoveMessage.

🔀

should_summarize

Router function. Checks message count after each chat turn — routes to summarize if > SUMMARY_THRESHOLD, else END.

💾

SqliteSaver

Persists RecipeState to disk. Each thread_id keeps an isolated conversation history that survives process restarts.

5.2

Project Structure

advanced-1-conversational-chatbot/ ├── config.py # Model name, temperature, threshold, DB path ├── llm.py # GeminiLLM wrapper — reads Config ├── state.py # RecipeState (messages + summary) ├── nodes.py # RecipeNodes — chat_node, summarize_node ├── graph.py # RecipeGraph — builds graph, SqliteSaver, save_figure() ├── recipe_runner.py # RecipeRunner — chat(), stream_chat(), demo __main__ ├── app.py # Gradio ChatInterface with streaming + New Session button ├── prompts/ │ ├── chat.txt # Chef Aria system persona │ └── summarize.txt # Summarisation prompt with {existing_summary} + {conversation} └── figure/ # Auto-created: graph.mmd + graph.png

5.3

State (state.py)

from typing import Annotated from langchain_core.messages import BaseMessage from langgraph.graph.message import add_messages from typing_extensions import TypedDict class RecipeState(TypedDict): messages: Annotated[list[BaseMessage], add_messages] summary: str

Two fields, two jobs: messages accumulates and supports deletion; summary is overwritten each time the summarise node runs.

5.4

Nodes (nodes.py)

Both nodes live in a RecipeNodes class. The constructor loads the LLM and both prompts from prompts/:

import os from langchain_core.messages import HumanMessage, RemoveMessage, SystemMessage from llm import GeminiLLM from state import RecipeState _PROMPTS_DIR = os.path.join(os.path.dirname(__file__), "prompts") def _load_prompt(filename: str) -> str: with open(os.path.join(_PROMPTS_DIR, filename), "r") as f: return f.read().strip() def _extract_text(content) -> str: """Normalise langchain-google-genai 4.x content (list or str) to plain str.""" if isinstance(content, str): return content if isinstance(content, list): return "".join( b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text" ) return str(content) class RecipeNodes: def __init__(self): self.llm = GeminiLLM().get_llm() self.chat_prompt = _load_prompt("chat.txt") self.summarize_prompt = _load_prompt("summarize.txt")

chat_node — builds the prompt from persona + optional summary + recent messages:

def chat_node(self, state: RecipeState) -> dict: summary = state.get("summary", "") system_content = self.chat_prompt if summary: system_content += f"\n\nPrevious conversation summary:\n{summary}" messages = [SystemMessage(content=system_content)] + list(state["messages"]) response = self.llm.invoke(messages) return {"messages": [response]}

summarize_node — compresses old messages and schedules their deletion:

def summarize_node(self, state: RecipeState) -> dict: existing_summary = state.get("summary", "") old_messages = state["messages"][:-2] lines = [] for m in old_messages: role = "User" if m.type == "human" else "Assistant" lines.append(f"{role}: {_extract_text(m.content)}") prompt = self.summarize_prompt.format( existing_summary=existing_summary, conversation="\n".join(lines), ) response = self.llm.invoke([HumanMessage(content=prompt)]) new_summary = _extract_text(response.content) delete_ops = [RemoveMessage(id=m.id) for m in old_messages] return {"summary": new_summary, "messages": delete_ops}

5.5

Graph Assembly (graph.py)

The router function and graph wiring are concise — the interesting work happens inside the nodes:

import os, sqlite3 from langgraph.checkpoint.sqlite import SqliteSaver from langgraph.graph import END, START, StateGraph from config import Config from nodes import RecipeNodes from state import RecipeState FIGURE_DIR = os.path.join(os.path.dirname(__file__), "figure") def should_summarize(state: RecipeState) -> str: if len(state["messages"]) > Config.SUMMARY_THRESHOLD: return "summarize" return "end" class RecipeGraph: def __init__(self): self.nodes = RecipeNodes() self.compiled_graph = self._build() def _build(self): graph = StateGraph(RecipeState) graph.add_node("chat", self.nodes.chat_node) graph.add_node("summarize", self.nodes.summarize_node) graph.add_edge(START, "chat") graph.add_conditional_edges( "chat", should_summarize, { "summarize": "summarize", "end": END, }, ) graph.add_edge("summarize", END) conn = sqlite3.connect(Config.DB_PATH, check_same_thread=False) checkpointer = SqliteSaver(conn) return graph.compile(checkpointer=checkpointer) def save_figure(self): os.makedirs(FIGURE_DIR, exist_ok=True) mmd_path = os.path.join(FIGURE_DIR, "graph.mmd") with open(mmd_path, "w") as f: f.write(self.compiled_graph.get_graph().draw_mermaid()) png_path = os.path.join(FIGURE_DIR, "graph.png") with open(png_path, "wb") as f: f.write(self.compiled_graph.get_graph().draw_mermaid_png()) print(f" Graph saved → {mmd_path}") print(f" Graph saved → {png_path}") def get_compiled_graph(self): return self.compiled_graph

sqlite3.connect(..., check_same_thread=False) is necessary because the Gradio streaming server runs the graph on a background thread while the main thread manages the UI.

5.6

Runner & Console Output (recipe_runner.py)

RecipeRunner wraps the compiled graph and exposes three methods used by both the CLI demo and the Gradio app:

from langchain_core.messages import HumanMessage from graph import RecipeGraph from nodes import _extract_text class RecipeRunner: def __init__(self): self.recipe_graph = RecipeGraph() self.app = self.recipe_graph.get_compiled_graph() def save_figure(self): self.recipe_graph.save_figure() def _config(self, thread_id: str) -> dict: return {"configurable": {"thread_id": thread_id}} def chat(self, message: str, thread_id: str) -> str: result = self.app.invoke( {"messages": [HumanMessage(content=message)]}, config=self._config(thread_id), ) return _extract_text(result["messages"][-1].content) def stream_chat(self, message: str, thread_id: str): for chunk, _ in self.app.stream( {"messages": [HumanMessage(content=message)]}, config=self._config(thread_id), stream_mode="messages", ): if not hasattr(chunk, "content"): continue content = chunk.content if isinstance(content, str) and content: yield content elif isinstance(content, list): for block in content: if isinstance(block, dict) and block.get("type") == "text": text = block.get("text", "") if text: yield text def get_history(self, thread_id: str) -> list: state = self.app.get_state(self._config(thread_id)) return state.values.get("messages", []) def get_summary(self, thread_id: str) -> str: state = self.app.get_state(self._config(thread_id)) return state.values.get("summary", "")

The __main__ block runs three demos: multi-turn summarisation, thread isolation, and cross-session persistence. Here is the expected console output:

============================================================ LangGraph Advanced — Personal Recipe Assistant Demo ============================================================ Saving graph architecture... Graph saved → .../figure/graph.mmd Graph saved → .../figure/graph.png ──────────────────────────────────────────────────────────── Demo 1: Priya's cooking session (summarisation demo) ──────────────────────────────────────────────────────────── 🙋 Priya (turn 1): Hi! I'm vegetarian and allergic to nuts. What's a quick dinner idea? 👩‍🍳 Chef Aria: Hi! Since you're vegetarian and allergic to nuts, a quick and delicious dinner... 🙋 Priya (turn 2): I have pasta, tomatoes, garlic, and spinach at home. Any recipe? 👩‍🍳 Chef Aria: Perfect! With pasta, tomatoes, garlic, and spinach you can make a simple... 🙋 Priya (turn 3): How long should I cook the pasta for al dente? 👩‍🍳 Chef Aria: For al dente pasta, cook it for 1-2 minutes less than the package... 🙋 Priya (turn 4): Can I add parmesan cheese and a squeeze of lemon to that pasta? 👩‍🍳 Chef Aria: Absolutely! Parmesan and lemon are wonderful additions... 📝 [Summarisation triggered — summary: 148 chars] Preview: Priya is vegetarian and allergic to nuts. She made a tomato-garlic-spinach pasta... 📊 State after 4 turns: Messages in state : 2 (older ones replaced by summary) Summary stored : yes ──────────────────────────────────────────────────────────── Demo 2: Marco's separate session (thread isolation) ──────────────────────────────────────────────────────────── 🙋 Marco: I want to make a classic Italian carbonara. Any tips? 👩‍🍳 Chef Aria: Classic carbonara is all about technique! Here are the key tips... ✅ Marco's session has no knowledge of Priya's vegetarian preferences. ──────────────────────────────────────────────────────────── Demo 3: Priya's follow-up (SqliteSaver persistence) ──────────────────────────────────────────────────────────── 🙋 Priya: I'm back! Can you suggest a weekend meal plan based on what we discussed? 👩‍🍳 Chef Aria: Welcome back! Based on your vegetarian diet and nut allergy... ✅ Chef Aria remembers Priya's preferences from the summary. ============================================================

5.7

Graph Diagram

The compiled graph has a clean two-node structure. save_figure() produces this Mermaid diagram:

flowchart TD S([__start__]) --> chat(chat) chat -.-> |"summarize"| summarize(summarize) chat -.-> |"end"| E([__end__]) summarize --> E style S fill:#e8f5e9,stroke:#43a047,color:#1b5e20 style E fill:#fce4ec,stroke:#e53935,color:#b71c1c style chat fill:#e3f2fd,stroke:#1e88e5,color:#0d47a1 style summarize fill:#fff3e0,stroke:#fb8c00,color:#e65100

Reading the diagram

Solid arrows are unconditional edges. Dashed arrows are conditional edges — the router label shows which return value triggers each path. After chat, the graph either ends immediately (under the threshold) or passes through summarize first.

6. Web Interface

The Gradio UI wraps RecipeRunner in a streaming chat interface. Each user message yields tokens one by one so the reply appears progressively — the same streaming pattern from Basics Part 4. A New Session button generates a fresh thread_id so the user can start a brand-new conversation without clearing the database.

import uuid import gradio as gr from recipe_runner import RecipeRunner class RecipeApp: def __init__(self): self.runner = RecipeRunner() def respond(self, message: str, _history: list, thread_id: str): if not message.strip(): yield "" return accumulated = "" for token in self.runner.stream_chat(message, thread_id): accumulated += token yield accumulated def launch(self): with gr.Blocks(title="👩‍🍳 Personal Recipe Assistant") as demo: thread_state = gr.State(value=str(uuid.uuid4())) chat = gr.ChatInterface( fn=self.respond, title="👩‍🍳 Personal Recipe Assistant", description=( "Ask Chef Aria for recipes, cooking tips, and meal plans. " "Your conversation is saved — come back any time and she'll " "remember your preferences." ), additional_inputs=[thread_state], ) gr.ClearButton( [chat.chatbot, chat.textbox], value="🔄 New Session", variant="primary", ).click( fn=lambda: str(uuid.uuid4()), outputs=[thread_state], ) demo.launch() if __name__ == "__main__": RecipeApp().launch()

Key design decisions in the app:

gr.State for thread_id — each browser tab gets its own UUID on load, so two users opening the app simultaneously get isolated conversations automatically.
Streaming via yield — respond() accumulates tokens and yields the growing string each time, satisfying Gradio's streaming contract for ChatInterface.
New Session button — replaces the thread_id in gr.State with a new UUID. The old conversation stays in the database but the UI starts fresh.
No theme= argument — passing theme= to ChatInterface raises a TypeError; styling should be handled via gr.Blocks wrapping.

Run the web app with:

cd langgraph/advanced-1-conversational-chatbot python app.py

Open the Gradio URL printed in the terminal (default http://127.0.0.1:7860). Ask Chef Aria several questions, then close the browser, reopen it, and continue the conversation — Aria will remember your preferences from the summary stored in recipe_chat.db.

Personal Recipe Assistant Gradio streaming UI

Fig. 1 — Chef Aria streaming a recipe suggestion. The New Session button (top right) starts a fresh thread without erasing the database.

6.1

What to Try

Use the following sequence to see every feature in action — summarisation, cross-turn memory, thread isolation, and cross-session persistence:

Turn 1 I'm vegetarian and I'm allergic to nuts. Keep that in mind.

Observe Chef Aria acknowledges your preferences. Every recipe from this point will be vegetarian and nut-free — without you repeating it.

Turn 2 Suggest a quick dinner I can make in 30 minutes.

Observe The suggestion respects both constraints from turn 1 — memory is already working across turns.

Turn 3 Can you give me the full recipe with quantities for that?

Observe Chef Aria knows which dish she just suggested and expands it — no need to repeat the dish name.

Turns 4 – 8 Keep asking about variations, ingredient substitutions, and cooking tips.

Observe Once the message count exceeds SUMMARY_THRESHOLD, the summarise node fires silently. Old messages are compressed — but Chef Aria still remembers your dietary restrictions from turn 1.

New Session Click New Session, then ask: "What dietary restrictions do you know about me?"

Observe Chef Aria has no memory — a fresh thread_id starts completely clean, proving thread isolation works.

✅ 7. Conclusion

You have built a production-grade conversational chatbot that handles the two hardest problems in long-running assistants: cross-session persistence (via SqliteSaver) and bounded token usage (via the messages + summary pattern). The design is deliberately minimal — two nodes, one conditional edge, and two state fields — yet it scales to real workloads where users return day after day with growing conversation histories.

The pattern is completely general. To adapt it for a different domain, replace prompts/chat.txt with a new persona, update prompts/summarize.txt to capture the right kind of facts, and rename the classes. The graph structure, state design, and streaming plumbing stay exactly the same.

A summary field in RecipeState carries compressed history across summarisation cycles without the token cost of storing full messages verbatim.
The summarize_node incrementally compresses old messages using the LLM and deletes originals with RemoveMessage — keeping only the last 2 messages in full.
The should_summarize conditional edge fires only when the message count exceeds the threshold — normal turns pay zero extra cost.
SqliteSaver persists RecipeState across process restarts; each thread_id is a completely isolated conversation.
Token-by-token streaming via stream_chat() and a Gradio gr.Blocks UI delivers a responsive, production-ready chat experience.

LangGraph Advanced Series — Part 1 of 5

This post is Part 1 of the LangGraph Advanced series. The remaining four parts build on the bounded-memory foundation established here:

Part 2 — Multi-Agent Architectures: coordinating specialised LangGraph graphs

Part 3 — Tool Use & ReAct Agents: binding and calling real-world tools

Part 4 — Human-in-the-Loop: interrupt/resume patterns for approval workflows

Part 5 — Subgraphs & Parallel Execution: nesting graphs and running branches concurrently