Is there a way in MCP to stream a LLM response chunk by chunk back to the client?

2 hours ago 1
ARTICLE AD BOX

I'm using FastMCP in python to implement a MCP server. Currently I run into a problem when it comes to streaming of the generated tokens from the LLM. I don't want to wait for the completed response and return the whole text, I rather want to stream chunk by chunk to improve the response time for the user. This seems to be a valid use case to me when working with MCP.

I don't have control about the Client, since I'm using Clients like LibreChat or Open WebUI to connect to the MCP server.

In open webui for example you can implement a pipeline, that supports streaming (yielding chunk by chunk).

Here is a minimal example of my use case:

from fastmcp import FastMCP from llama_index.llms.lmstudio import LMStudio import asyncio mcp = FastMCP() @mcp.tool() async def story_teller(topic: str): llm = LMStudio(model_name="qwen/qwen3-4b-2507") prompt = f"Tell me a story about {topic}." async for chunk in await llm.astream(prompt): yield chunk if __name__ == "__main__": asyncio.run(mcp.run_async(host="0.0.0.0", port=8001, transport="streamable-http"))

Unfortunately all the MCP clients I tested cannot handle the response from the server:

<async_generator object story_teller at 0x000002DEFCC097E0>

Does anyone know how this use case is suppposed to work?

Read Entire Article