Problem
When using web search (use_search=True), there's no systematic way to track:
- Which sources were actually consulted
- Search failures or timeouts
- Data quality/confidence level
- What manual verification might be needed
Currently, users must parse free-text fields to understand if the agent had difficulties.
Proposed Solution
Add an optional track_limitations parameter that creates a _limitations metadata column with structured info about each row's search quality.
Schema
class SearchLimitations(BaseModel):
"""Auto-generated metadata about search quality"""
confidence_level: Literal["high", "medium", "low"] = Field(
description="high=multiple concordant sources, medium=some gaps, low=sparse/failed"
)
sources_consulted: list[str] = Field(
description="List of sources actually consulted"
)
search_failures: list[str] = Field(
default_factory=list,
description="Any searches that failed (timeout, 429, etc.)"
)
limitations: str = Field(
description="Description of limitations: outdated data, conflicts, missing info"
)
manual_verification_needed: Optional[str] = Field(
default=None,
description="Suggested manual checks if confidence is low"
)
Usage
result_df = dataframeit(
data=df,
questions=MySchema,
use_search=True,
track_limitations=True, # NEW
)
# Result includes _limitations column with structured metadata
print(result_df['_limitations'].iloc[0])
# {'confidence_level': 'medium', 'sources_consulted': ['Orphanet', 'FDA'], ...}
Implementation Notes
-
With search_per_field=False: Single _limitations column for the whole row
-
With search_per_field=True: Either:
- One
_limitations column aggregating all fields, OR
- Per-field limitations in each field's nested dict (e.g.,
doenca_rara._limitations)
-
The agent would be instructed to self-evaluate its search quality as part of the structured output.
Benefits
- Quality assurance: Easily filter rows with low confidence for manual review
- Debugging: Understand why certain searches failed
- Transparency: Document data provenance and limitations
- Reproducibility: Know which sources were consulted
Alternative: User-defined limitations field
Allow users to define their own limitations schema that gets appended to every search:
class MyLimitations(BaseModel):
confianca: str
fontes: str
problemas: str
result_df = dataframeit(
...,
limitations_schema=MyLimitations, # Auto-added to each row
)
Problem
When using web search (
use_search=True), there's no systematic way to track:Currently, users must parse free-text fields to understand if the agent had difficulties.
Proposed Solution
Add an optional
track_limitationsparameter that creates a_limitationsmetadata column with structured info about each row's search quality.Schema
Usage
Implementation Notes
With
search_per_field=False: Single_limitationscolumn for the whole rowWith
search_per_field=True: Either:_limitationscolumn aggregating all fields, ORdoenca_rara._limitations)The agent would be instructed to self-evaluate its search quality as part of the structured output.
Benefits
Alternative: User-defined limitations field
Allow users to define their own limitations schema that gets appended to every search: