The pipeline uses RecursiveCharacterTextSplitter with a default chunk_size=1000 and stores chunks as VARCHAR(2000):
# pipelines/kubeflow-pipeline.py
'content_text': chunk[:2000],
But both server/app.py and server-https/app.py truncate to 400 characters before passing results to the LLM:
if isinstance(content_text, str) and len(content_text) > 400:
content_text = content_text[:400] + "..."
With the default chunk size of 1000 chars, the server-side truncation discards roughly 60% of each chunk before the LLM sees it. The 400-char limit is a reasonable trade-off for fitting more results into the context window, but the issue is that the MCP server in kagent-feast-mcp/mcp-server/server.py does not truncate at all. It passes full content_text to the agent.
This means the same query against the same index will produce different answer quality depending on whether it goes through Architecture A (main servers, 400 chars) or Architecture B (Kagent/MCP, full content). That makes it hard to evaluate retrieval quality consistently across the system.
Suggested approach: make the truncation limit configurable via environment variable (e.g., CONTENT_MAX_CHARS) and align the default across all server implementations so evaluation results are comparable regardless of code path.
PR freeze is on, so just flagging this as an issue. Happy to pick it up when PRs open back up.
The pipeline uses
RecursiveCharacterTextSplitterwith a defaultchunk_size=1000and stores chunks asVARCHAR(2000):But both
server/app.pyandserver-https/app.pytruncate to 400 characters before passing results to the LLM:With the default chunk size of 1000 chars, the server-side truncation discards roughly 60% of each chunk before the LLM sees it. The 400-char limit is a reasonable trade-off for fitting more results into the context window, but the issue is that the MCP server in
kagent-feast-mcp/mcp-server/server.pydoes not truncate at all. It passes fullcontent_textto the agent.This means the same query against the same index will produce different answer quality depending on whether it goes through Architecture A (main servers, 400 chars) or Architecture B (Kagent/MCP, full content). That makes it hard to evaluate retrieval quality consistently across the system.
Suggested approach: make the truncation limit configurable via environment variable (e.g.,
CONTENT_MAX_CHARS) and align the default across all server implementations so evaluation results are comparable regardless of code path.PR freeze is on, so just flagging this as an issue. Happy to pick it up when PRs open back up.