feat(py): integrate DAP into generate call, expand wildcards#5075
feat(py): integrate DAP into generate call, expand wildcards#5075huangjeff5 wants to merge 3 commits intojh-dap-pr1from
Conversation
…enerate integration
There was a problem hiding this comment.
Code Review
This pull request introduces a child registry pattern to Genkit, allowing for ephemeral, scoped registries that delegate lookups to a parent. It also adds support for expanding wildcard tool names (e.g., provider:tool/*) via dynamic action providers within the generate loop. I have reviewed the implementation and provided feedback regarding potential performance optimizations for wildcard name checking and registry traversal.
I am having trouble creating individual review comments. Click here to see my feedback.
py/packages/genkit/src/genkit/_ai/_generate.py (73)
The condition ':' not in name or not name.endswith('*') is slightly inefficient because ':' not in name is checked for every iteration, even if the name is a simple tool name. Consider checking for the wildcard suffix first, as it is more specific.
py/packages/genkit/src/genkit/_core/_registry.py (619-622)
The loop for parent_meta in await self._parent.list_actions(allowed_kinds): is called inside list_actions. If the registry hierarchy is deep, this could lead to redundant recursive calls. Consider if caching or a different traversal strategy is needed for performance.
| logger = get_logger(__name__) | ||
|
|
||
|
|
||
| async def expand_wildcard_tools(registry: Registry, tool_names: list[str]) -> list[str]: |
There was a problem hiding this comment.
This works a bit differently from the javascript implementation and there is a critical bug here regarding how the expanded tool names are returned.
Currently, the function strips the provider prefix and appends only the unqualified tool name:
tool_name = meta.get('name')
if tool_name:
expanded.append(str(tool_name))This creates a serious conflict resolution issue because it drops the explicit provider binding. When the generation loop later attempts to resolve this unqualified name (e.g., 'echo'), the registry will fall back to querying all registered DAPs in registration order to find a match.
The Bug Scenario:
- You have two MCP servers registered:
mcp1(registered first) andmcp2(registered second). - Both servers expose a tool named
'echo'. - The user explicitly requests tools from only the second server:
tools=['mcp2:tool/*']. - This function expands that request to just
['echo']. - When the framework resolves
'echo', it asks the DAPs in order.mcp1replies first saying "I have echo!" and its tool is executed instead ofmcp2's.
The Fix: To maintain the explicit provider binding (and match how the JS implementation behaves), we should reconstruct the fully-qualified DAP key so the registry knows exactly which provider to query later on:
tool_name = meta.get('name')
if tool_name:
expanded.append(f'/dynamic-action-provider/{provider_name}:{action_type}/{tool_name}')This guarantees the executed tool will strictly belong to the provider the user requested.
It would also be a good idea to have a test for this scenario.
| registry = Registry() | ||
| result = await expand_wildcard_tools(registry, ['my_tool', 'other_tool']) | ||
| assert result == ['my_tool', 'other_tool'] | ||
|
|
There was a problem hiding this comment.
Here's a test that shows the bug I mentioned:
async def test_wildcard_tools_avoids_shadowing_conflict() -> None:
"""Explicit wildcard provider paths should not be shadowed by earlier providers."""
ai = Genkit()
pm, _ = define_programmable_model(ai)
call_log: list[str] = []
class Inp(BaseModel):
x: str
async def echo1_fn(inp: Inp) -> str:
call_log.append('mcp1')
return 'echo 1'
async def echo2_fn(inp: Inp) -> str:
call_log.append('mcp2')
return 'echo 2'
# Detached Actions (not registered in root registry directly)
echo1_action = Action(name='echo', kind=ActionKind.TOOL, fn=echo1_fn, metadata={'name': 'echo'})
echo2_action = Action(name='echo', kind=ActionKind.TOOL, fn=echo2_fn, metadata={'name': 'echo'})
async def dap1_fn() -> DapValue:
return {'tool': [echo1_action]}
async def dap2_fn() -> DapValue:
return {'tool': [echo2_action]}
# Register mcp1 first. If resolution falls back to an unqualified lookup, mcp1 will "win".
ai.define_dynamic_action_provider('mcp1', dap1_fn)
ai.define_dynamic_action_provider('mcp2', dap2_fn)
# The model calls the 'echo' tool
pm.responses = [
_tool_call_response('echo', {'x': 'hello'}),
_text_response('finished'),
]
response = await ai.generate(
model='programmableModel',
prompt='use echo',
# Crucially, we explicitly request tools from mcp2 ONLY
tools=['mcp2:tool/*'],
)
assert response.text == 'finished'
# If the bug is present, this will fail because it will fall back to the unqualified
# global loop and find mcp1's 'echo' tool instead.
assert call_log == ['mcp2']
Summary
_generate_actioncreates aneffective_registry(child of the root registry) at the start of each generate call. Tool resolution and resource lookup use this child so DAP-resolved tools are scoped to the call and do not pollute the root registry.expand_wildcard_tools()expands patterns such asprovider:tool/*andprovider:tool/prefix*into concrete tool names viaDynamicActionProvider.list_action_metadata. Expansion runs before parameter resolution so the model receives a fullToolDefinitionlist.resolve_tool_requestsreceiveseffective_registryso execution uses the child’s resolved action cache.