Is there a way in MCP to stream a LLM response chunk by chunk back to the client?

2 hours ago 1

ARTICLE AD BOX

I'm using FastMCP in python to implement a MCP server. Currently I run into a problem when it comes to streaming of the generated tokens from the LLM. I don't want to wait for the completed response and return the whole text, I rather want to stream chunk by chunk to improve the response time for the user. This seems to be a valid use case to me when working with MCP.

I don't have control about the Client, since I'm using Clients like LibreChat or Open WebUI to connect to the MCP server.

In open webui for example you can implement a pipeline, that supports streaming (yielding chunk by chunk).

Here is a minimal example of my use case:

from fastmcp import FastMCP from llama_index.llms.lmstudio import LMStudio import asyncio mcp = FastMCP() @mcp.tool() async def story_teller(topic: str): llm = LMStudio(model_name="qwen/qwen3-4b-2507") prompt = f"Tell me a story about {topic}." async for chunk in await llm.astream(prompt): yield chunk if __name__ == "__main__": asyncio.run(mcp.run_async(host="0.0.0.0", port=8001, transport="streamable-http"))

Unfortunately all the MCP clients I tested cannot handle the response from the server:

<async_generator object story_teller at 0x000002DEFCC097E0>

Does anyone know how this use case is suppposed to work?

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

Is there a way in MCP to stream a LLM response chunk by chunk back to the client?

ARTICLE AD BOX

Related

How to recreate Google Quantum AI Plots?

Is asyncio cancellation injected strictly at suspension points, or can synchronous code run before CancelledError is raised?

How do i retrieve 'data attribution' from Google Earth?

LEFT SIDEBAR AD