Skip to content

kagent-feast-mcp: runtime monkey-patch of Feast VARCHAR limit is fragile and breaks across versions #182

@JayDS22

Description

@JayDS22

Summary

The kagent-feast-mcp ingestion pipeline uses a runtime monkey-patch to work around Feast's hardcoded max_length=512 VARCHAR limit in its Milvus online store integration. This approach modifies installed library source code on disk during pipeline execution, which is fragile and opaque.

Location

kagent-feast-mcp/pipelines/kubeflow-pipeline.py in the store_via_feast component:

# Patch Feast VARCHAR limit (hardcoded 512 -> 4096) and reload module
import importlib
import feast.infra.online_stores.milvus_online_store.milvus as milvus_mod
src_file = inspect.getfile(milvus_mod)
with open(src_file, "r") as f:
    content = f.read()
if "max_length=512" in content:
    with open(src_file, "w") as f:
        f.write(content.replace("max_length=512", "max_length=4096"))

Problem

  1. Fragile across Feast versions: If Feast renames max_length or changes the file structure, the string replacement silently fails and data gets silently truncated at 512 chars.
  2. Opaque in debugging: Modifying installed library source at runtime means pip show feast still reports the unpatched version. Anyone debugging the pipeline won't see the patch unless they read this specific component's source.
  3. Unnecessary for the use case: The kagent-feast-mcp/mcp-server/server.py already uses pymilvus directly (via MilvusClient) to query the same Milvus instance without needing Feast at all. This is simpler, thread-safe, and avoids the VARCHAR limitation entirely.
  4. Drop-and-recreate pattern: The store_via_feast component also drops the entire Milvus collection before reinserting (utility.drop_collection). If the pipeline fails mid-ingestion (e.g., GitHub API rate limit), the collection is empty and the agent returns nothing. This compounds the Feast layer's fragility.

Suggested Direction

This supports the ADR-008 direction (pymilvus over Feast) that's being discussed. The MCP server already demonstrates the cleaner pattern:

  • Ingestion: Use pymilvus directly with MilvusClient.upsert() keyed on file_unique_id + chunk_index for idempotent writes (no drop-and-recreate).
  • Serving: Already done -- kagent-feast-mcp/mcp-server/server.py uses MilvusClient directly.
  • Mark Feast pipelines as legacy: Keep them for reference but document that pymilvus direct is the production path.

PR freeze is on so just flagging this for architecture discussion. Happy to contribute to the ADR-008 doc.

Related: #181 (content truncation), #63 (model reload), #28 (connection pooling), #72 (codebase cleanup)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions