If you’re just using a managed MCP server, such as Vendia’s free tier product, you don’t have to know or care how it works – just start connecting your AI client(s) and backend resources and start building AI applications. But if you’re building your own MCP server from scratch and are thinking through implementation approaches and/or you’d like to understand what drove our design considerations, read on!
“Serverless MCP Server” sounds a bit silly, but it’s actually a compelling design pattern for building and scaling these new bridges between AI clients and legacy enterprise resources. There are several reasons why:
- An MCP server is a gateway, not an application or storage system itself. With a few exceptions (we’ll come back to these below), MCP servers are a bridge between AI clients and enterprise systems and resources. Requests and data transit through them, but they aren’t generally storing data or providing the underlying implementation themselves.
- All MCP Server implementations are “green”. MCP is new, and so MCP implementations are also new; there are no decades old “legacy MCP servers” that need to keep running a certain way.
- Most MCP-based requests are synchronous and short lived. While there are some asynchronous elements to the MCP protocol, such as resource notifications, existing AI clients primarily make short-lived, synchronous requests with relatively simple state management, such as retrieving a paginated result.
- MCP-managed state is generally modest and changes slowly. The state an MCP server itself would typically manage, such as user or resources permissions, is usually of a “control plane” nature.
- The MCP protocol is “serverless friendly”. The protocol allows for both clients and servers to indicate they need to end an existing session, and there’s a strong security protocol when a session is terminated from either end regarding identity-related credentials.
- Function “cold start” latency is generally a nonissue in AI client contexts. AI clients can take several seconds to think about a response – longer in the case where large amounts of subplanning are needed or complex outputs like images must to be generated. In this context, a “cold start” – the extra latency experienced when a new instance of a serverless function starts up – is generally not problematic: The difference between a Lambda function’s overhead being 8 milliseconds vs 1 second is usually not critical given the typical latency of LLM conversational turnaround.
These structural characteristics don’t require an MCP server implementation to use serverless infrastructure such as AWS Lambda functions, but they enable it – if that’s the team’s design preference. At Vendia we like serverless architectures for their scalability and simplicity. Not having to rent, manage, upgrade, cycle, monitor, and scale servers is a big win for a startup. But we also like this approach because it encourages good distributed system practices: Since serverless functions can’t keep long-lived state in memory or on disk, developers are forced to put that state somewhere else (such as a database), which usually makes it safer and more reliable than inside an individual server that could crash or get restarted at any point. Even when building with “stateful servers”, it’s useful to approach things as if they were serverless to keep the system as fault tolerant and horizontally scalable as possible.
Sessions and Streaming
One of the benefits of moving from a totally stateless MCP server to enabling session support is that it makes streaming possible. Instead of forcing the client to wait for a full response, the server can return progress notifications and partial results as they become available. For long-running queries, bulk data fetches, or calls to slow backends, this makes the interaction feel responsive and resilient. If a connection drops, the client can reconnect and resume from where it left off; if the user cancels, the server can stop the work and release resources.
In a serverless environment this pattern can still work cleanly: session metadata and stream cursors live in external storage, while individual function invocations remain stateless. Each new request can continue the session, stream results back as they’re ready, and then exit without losing continuity. Platforms like AWS Lambda (with Function URLs) and Vercel support serverless streaming, making these approaches practical to implement in production.
When not to serverless: Contraindications and Other Approaches
Are there reasons not to use a serverless approach when architecting a new MCP server implementation? Here are some situations where the right choice isn’t clearcut and you might want to take a broad view of all the alternatives:
- Local state – Despite wanting to minimize the need for local state, sometimes it’s hard (or not worth trying) to avoid. An example would be a large data item being retrieved through MCP which requires a smaller pagination window on the MCP side versus on the backend. Managing some temporary state in the MCP server itself may be the simplest solution in these cases, and depending on the details might be simpler to accomplish with a conventional server than with a serverless function. Local caches may also be easier to implement than deploying or creating a separate caching service just to maintain a serverless implementation.
- “Fat clients” and resource multiplexing – A special case of local state is when the MCP server must host a client of its own to connect with some enterprise resource. Legacy clients, especially “layer 4” solutions, can require substantial time to initialize and often are subject to limited connections that require pooling (multiplexing inbound requests to a limited number of backend resources). While AWS offers solutions for some common cases of connection pooling, it’s still very possible to encounter legacy systems that make serverless approaches more difficult and that are easier to manage on conventional “serverful” architectures.
- Hybrid approaches – While it can be tempting to think of the “server vs serverless” decision as all or nothing, sometimes a mix is best. For example, an MCP server handling a complex and disparate set of requests might want to set up a “router”, sending stateless requests to Lambda functions while directing stateful ones to a Fargate instance. This approach can also be used to “pin” users or sessions while they’re engaging in a stateful activity, while still getting the advantage of letting most users be served by any available serverless function for the majority of traffic.
Conclusion
At Vendia we’re passionate about helping to deliver data solutions for AI and helping to build a robust ecosystem of easy-to-use MCP solutions for companies of all sizes. MCP is a dynamic and evolving standard, and we’re excited to be growing along with it! To learn more, visit us at www.vendia.com.