Stateful MCP Servers on ECS Fargate: What Happens When You Deploy

13+ Years of experienced as Full Stack Developer. Also worked as architect for building solutions and product to help for automation. Solution-oriented and hands-on technical utility player. Having experience of more than 4 years of experience in E commerce and finance in each domain. Experience in having driving business automation, marketing using technology. Strong follower of open source technology. Used PHP, Python, AWS and Angular as technology stack to build product
A few weeks back I was working on a PoC with Bedrock AgentCore Runtime. While doing that I came across multiple blogs and discussions around MCP server hosting on AWS. Most of them were pointing to either Bedrock AgentCore or Lambda. Very few talked about ECS Fargate.
That got me thinking. I have been using Fargate for containerised workloads for a while now. It is my go-to when a team needs containers without managing the underlying infrastructure. So the question came naturally — can Fargate host a stateful MCP server? And more importantly, what happens when you actually deploy it in a real scenario?
As an architect I believe you should know all the options before recommending one. Not just what the docs say — what actually happens when you run it. So I decided to test it myself.
This blog is what I found. Specifically what happens when you run a stateful MCP server on ECS Fargate and then do a rolling deployment while a session is active. The results were not what I expected.
MCP hosting on AWS — what are your options?
Before jumping into the experiment, let me give some context on why Fargate and not the other options.
When it comes to hosting MCP servers on AWS you have three realistic paths:
Bedrock AgentCore Runtime is AWS's managed MCP hosting service. You write your MCP server, deploy it, and AgentCore handles session isolation at the platform level. It supports both stateless and stateful MCP servers. By default stateless mode is recommended — AgentCore automatically adds an Mcp-Session-Id header and manages connection continuity at the platform level. For multi-turn interactions that need session state preserved across requests, stateful mode (stateless_http=False) is available and the runtime handles session preservation within the same invocation. The key difference from running stateful MCP on Fargate yourself: AgentCore manages the session layer for you regardless of mode. You are not responsible for sticky sessions, deregistration delays, or what happens to your session during a platform update. That operational burden stays with AWS.
AWS Lambda comes in two modes now and the difference matters for MCP.
Standard Lambda is stateless by nature. Cold starts are a latency concern — and since August 2025 also a cost concern, as AWS now bills the INIT phase the same as invocation duration. For lightweight or infrequent MCP tool calls this is still simple and cost-effective. But for agent workloads where a session expects low latency tool calls, standard Lambda cold starts can be disruptive.
Lambda Managed Instances (LMI) changes the picture. LMI runs your Lambda functions on EC2 instances in your own account — AWS still manages the instance lifecycle, patching and scaling, but your functions run on longer-lived compute. The result: no cold starts at all, multi-concurrency support where each execution environment handles multiple invocations simultaneously, and EC2-based pricing which can be significantly cheaper for steady-state workloads.
For MCP specifically, LMI is an interesting option for lighter workloads that need low latency tool calls without cold start risk, while keeping the serverless programming model. The constraint is the same as standard Lambda — stateless by nature, so session context still has to live somewhere else. But the cold start objection largely disappears with LMI.
LMI is designed for steady-state predictable workloads — it scales more gradually than standard Lambda and does not burst instantly. If your MCP workload has very spiky or unpredictable traffic, standard Lambda or Fargate may still be better suited.
ECS Fargate gives you your own container, your own session model, your own trade-offs. Fits teams already running Fargate workloads, teams with compliance or data residency requirements, or teams building something the managed service does not support yet. More control, more responsibility.
I chose Fargate because I already use it and wanted to understand what it actually does with stateful MCP under real conditions — not a happy path demo.
Setting up the experiment— with help from Kiro
When I started looking at the AWS sample repository for stateful MCP on ECS — aws-samples/sample-serverless-mcp-servers — I found it was SAM based. It also expected VPC, CIDR, ALB and other networking prerequisites to be in place before running sam deploy. That meant doing a lot of manual setup before I could even start the experiment.
I did not want to spend my weekend debugging SAM prerequisites. I wanted to get to the actual experiment.
So I decided to build the infrastructure from scratch. And this is where Kiro helped. I used Kiro — AWS's agentic IDE — to scaffold the entire experiment setup: the FastMCP server, the CDK infrastructure including VPC, ALB, ECS cluster and Fargate task definition, and the test client.
Here is what I built:
A stateful FastMCP server in Python holding session state in memory
ALB with sticky sessions enabled —
lb_cookietype, 1 day durationECS Fargate service with 2 tasks and rolling deployment configured
A test client using
httpxwith a persistent cookie jar, making continuous tool calls every 5 secondsTask ID instrumented in every tool response by fetching from the ECS container metadata endpoint
import httpx, os
metadata = httpx.get(os.environ["ECS_CONTAINER_METADATA_URI_V4"] + "/task").json()
TASK_ID = metadata["TaskARN"].split("/")[-1] # fetched once at server startup
I deliberately did not add a SIGTERM handler, did not externalise session state, and did not add any retry logic. I wanted to observe the default — what the pattern actually does out of the box before any hardening. The test client ran two operations per cycle: set_session_value to write state, followed immediately by get_session_state to read it back and confirm. Session state accumulated across calls — seq_1, seq_2, seq_3 and so on — so any loss of state would be immediately visible.
What I observed
The setup confirmed
Before triggering any deployment I confirmed the ALB configuration:
Everything correctly configured as the documentation recommends. If you are new to sticky sessions and why they matter for stateful workloads, the AWS Prescriptive Guidance on load balancer stickiness is a good starting point.
I then started the test client and let it run. Then mid-session I triggered a forced rolling deployment:
aws ecs update-service \
--cluster YOUR_CLUSTER \
--service YOUR_MCP_SERVICE \
--force-new-deployment
Here is what the logs showed.
Finding 1: The cookie rotation red herring
The first thing I noticed in the logs was the AWSALB cookie changing on every single response — from call 1, before any deployment was triggered.
{"call_number": 1, "tool": "set_session_value", "http_status": 200,
"task_id": "507bf31b..."}
{"event": "session_cookie_changed", "cookie_name": "AWSALB",
"old_cookie": "OWXI55yd...", "new_cookie": "XItyHvSg..."}
{"call_number": 2, "tool": "set_session_value", "http_status": 200,
"task_id": "507bf31b..."}
{"event": "session_cookie_changed", "cookie_name": "AWSALB",
"old_cookie": "XItyHvSg...", "new_cookie": "md8WEI7W..."}
Cookie changing every call. Naturally the first instinct is — stickiness is broken. Requests are bouncing between tasks.
But look at the task ID. 507bf31b — same on every single successful call across all 39 calls before failure. The ALB was routing to the same task the entire time despite the cookie changing.
What is actually happening: the ALB re-encrypts the sticky cookie token on every response even when routing to the same target. The cookie value rotates but the target it encodes stays the same. This is normal ALB behaviour — it is not routing instability.
Engineering judgment: If you see cookie rotation in your logs and start debugging stickiness, you will spend days on the wrong problem. The cookie value is irrelevant. The target it encodes is what matters. Verify using task ID in your responses, not by watching the cookie.
This also has an important implication: any MCP client that captures the sticky cookie once at session initialisation and reuses it without updating — which is the natural implementation — will break stickiness the moment it sends a stale cookie value. The ALB will treat it as a new session and route via round-robin. With 2 tasks running that means a 50% chance of landing on the wrong task on every call.
My test client used httpx.Client with a persistent cookie jar that automatically updates on every response. That is what kept the session alive across 39 calls. The aws-samples repo mentions patching for cookie handling — but does not explain why updating on every response is critical, not just at session init.
Finding 2: The atomic failure
This is the central finding.
At 15:20:12 UTC, call number 39's set_session_value succeeded:
{
"timestamp": "2026-04-26T15:20:12.067393+00:00",
"call_number": 39,
"tool": "set_session_value",
"http_status": 200,
"task_id": "507bf31beb2f41abae593f5cfd023b5e",
"state": {"seq_1": "call_1", "...": "...", "seq_39": "call_39"}
}
Five seconds later, call number 39's get_session_state failed:
{
"timestamp": "2026-04-26T15:20:17.134593+00:00",
"call_number": 39,
"tool": "get_session_state",
"http_status": 404,
"error": {"code": -32600, "message": "Session not found"}
}
Same call number. Same MCP session ID. Same logical operation — write then read. The write succeeded on task 507bf31b. The read landed on the new task. The new task had no knowledge of that session. 404.
The gap was 5 seconds. In those 5 seconds the deregistration delay expired, the old task was removed from the ALB target group, and the next request was routed to the replacement task.
This is not an eventual consistency problem. This is an atomic operation split across a task boundary.
An AI agent that writes state and immediately reads it back to confirm — which is the natural pattern for any tool that modifies and verifies — cannot do so safely across a deployment boundary. The write may have landed on a task that no longer exists by the time the read arrives. The agent cannot tell the difference between "session not found because I have a bug" and "session not found because my task was replaced 5 seconds ago." It cannot retry safely. It cannot roll back. The state is in an unknown condition.
Finding 3: Your monitoring will show nothing
This is what makes this failure mode operationally dangerous.
During the entire failure sequence — calls 39 through 50, all returning 404 — here is what your monitoring shows:
ALB: healthy targets, no 5xx errors
ECS service: desired count met, tasks running
CloudWatch alarms: nothing triggered
ECS service events: deployment completed successfully
The failure is code: -32600 "Session not found" — a JSON-RPC application error, not an HTTP error. Your ALB access logs show 404 responses but 404 is not typically alarmed in most setups. And even if it is, the error message is indistinguishable from a bug in your tool implementation.
Your on-call engineer will look at the infrastructure dashboard and see green. Your application engineer will look at the error and check their code. Both will find nothing wrong. The failure lives in the gap between the deployment event and the application layer — and nothing connects them automatically.
Engineering judgment: If you are running stateful MCP on Fargate you need an application-level alarm specifically on
-32600errors correlated with deployment events. Infrastructure health checks will not catch this.
One more safety net that will not help here: the ECS deployment circuit breaker. The circuit breaker triggers on tasks that fail to reach RUNNING state or fail health checks. In this failure mode your new task is RUNNING, your health check passes, and ECS considers the deployment successful. The circuit breaker has no visibility into whether active MCP sessions were lost during the transition. The failure passes every gate AWS provides automatically.
Finding 4: The deregistration delay is your session cliff timer
AWS documents the deregistration delay as a connection draining setting. For stateful MCP on Fargate it is actually your session survival window — the countdown timer between when a deployment starts and when your session dies.
Across my runs with different configurations:
| Run | Tasks | Deregistration delay | Session survived until |
|---|---|---|---|
| 1 | 1 | 300s (default) | Call 47 — ~61s after trigger |
| 2 | 1 | Changed | Call 48 — ~4 min after trigger |
| 3 | 2 | 300s | Call 39 — ~3.5 min after trigger |
| 4 | 2 | 300s | Call 39 — atomic failure at 5s gap |
The deregistration delay controlled the survival window in every run. Not the stickiness duration (86400 seconds — that number is fiction during deployments). Not the task count. The deregistration delay alone.
But here is the honest conclusion: no value of deregistration delay removes the failure. It only changes when the cliff arrives. A 30 second delay means your session cliff is 30 seconds after deployment. A 900 second delay means your session survives longer but your old tasks linger for 15 minutes, slowing rollbacks and increasing cost. You are not solving the problem — you are choosing when to accept the loss.
One more thing worth noting here: Fargate's default stopTimeout is 30 seconds (AWS reference). If you do not set a SIGTERM handler and raise this value, ECS will SIGKILL the container within 30 seconds of sending SIGTERM — regardless of your deregistration delay. So even if you set a 300 second deregistration delay, an unhandled SIGTERM means your session gets a hard kill within 30 seconds. The deregistration delay and stopTimeout work together — both need to be tuned, not just one.
A minimal SIGTERM handler in FastMCP looks like this:
import signal, time
def handle_sigterm(signum, frame):
print("SIGTERM received — draining active sessions")
time.sleep(25) # stay alive within stopTimeout window before exit
exit(0)
signal.signal(signal.SIGTERM, handle_sigterm)
The sleep value must be less than your stopTimeout setting. If stopTimeout is 30 seconds (default) and you sleep 25, the handler completes cleanly. If you forget to raise stopTimeout above 30 seconds and sleep longer, SIGKILL fires before the handler finishes.
One related consideration worth flagging: if your health check endpoint and MCP handler run in separate processes or on different ports, a new task can pass the ALB health check before the MCP handler is fully initialised — ECS has no native readiness probe separation the way Kubernetes does. In my implementation both run in the same uvicorn process on port 8000, so if the health check passes the MCP handler is already up. But if your setup is different, design for this explicitly.
What this means architecturally
You have three honest options. I will be clear about which ones I have tested and which are architectural paths for a follow-up.
Option A — Design for the failure
Make your MCP tools idempotent. If a write-then-read pair fails, the client can retry the full operation safely without risk of duplicate side effects. This works for tools that are naturally idempotent — read-heavy tools, query tools, lookup tools. It fails for tools that modify external state once — sending a message, creating a record, triggering a payment. If your agent workflow has side effects, idempotency alone is not enough.
Option B — Externalise session state
Move session storage to ElastiCache Redis or DynamoDB. The session is no longer tied to a specific task — any task can serve any session. Rolling deployments become safe because the new task can find the session in the external store. This eliminates the failure mode entirely.
The cost: the MCP SDK does not support external session persistence natively. You need to patch the session layer. Every tool call now has an external store read/write on the hot path — latency increases. Operational complexity increases. This is the right answer for multi-turn agent workflows that genuinely cannot tolerate session loss. I have not built this yet — it is the subject of a follow-up experiment.
Option C — Go stateless, let the platform handle sessions
This is what Bedrock AgentCore chose. Stateless MCP server, session isolation managed at the platform layer. The application never owns session state — the infrastructure does. Zero risk of the failure mode I described above.
The cost: you give up control over the session model. You take on the constraints of the managed service. If you have compliance requirements around data residency or need session behaviour the platform does not support, this path is not available to you.
So is Fargate a good fit for stateful MCP?
It depends — but not in the vague way that phrase usually means. Here is a more specific answer:
Fargate is a good fit if your MCP tools are idempotent and session loss during deployments is acceptable or recoverable in your workflow.
Fargate with externalised session state is a good fit if you need stateful multi-turn sessions, have compliance or control requirements that rule out managed services, and are willing to own the additional complexity.
Fargate with in-memory stateful sessions and the default configuration is not production-ready for agent workloads that cannot tolerate session loss. The AWS sample pattern works. Until you deploy. And in production, you deploy all the time.
If you are building something lighter — a few tools, mostly stateless, occasional multi-turn — Fargate is capable and operationally straightforward. If you are building something larger — long-running agent sessions, complex state, frequent deployments — you need to solve the session persistence problem before you go to production.
That is the answer I was looking for when I started this. Now I have it.
What is next
The experiment is not finished. The next step is to actually build Option B — externalise session state to Redis, run the same deployment experiment, and show whether the atomic failure disappears. That blog will have the same structure: real logs, real task IDs, real failure or real fix.
If you are trying to make this decision for a real workload and want to talk through it, find me on X or LinkedIn.
All experiment code is available on Stateful MCP Server on ECS Fargate - GitHub. The test client, CDK infrastructure, and FastMCP server with task ID instrumentation are in the repository linked below.




