Skip to content

feat: add start_date/end_date time filter support for search job#317

Merged
jinliyl merged 3 commits into
agentscope-ai:mainfrom
xyf2020:time_filter
Jul 3, 2026
Merged

feat: add start_date/end_date time filter support for search job#317
jinliyl merged 3 commits into
agentscope-ai:mainfrom
xyf2020:time_filter

Conversation

@xyf2020

@xyf2020 xyf2020 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator
  • Add _extract_date_from_path to extract validated YYYY-MM-DD from chunk paths
  • Add start_date/end_date filtering in _matches_search_filter
  • Implement progressive recall in FaissLocalFileStore.vector_search
  • Promote start_date/end_date from context to search_filter in SearchStep
  • Add start_date/end_date parameters to search job in default.yaml
  • Add unit tests for date filter functionality

- Add _extract_date_from_path to extract validated YYYY-MM-DD from chunk paths
- Add start_date/end_date filtering in _matches_search_filter
- Implement progressive recall in FaissLocalFileStore.vector_search
- Promote start_date/end_date from context to search_filter in SearchStep
- Add start_date/end_date parameters to search job in default.yaml
- Add unit tests for date filter functionality
@jinliyl

jinliyl commented Jul 3, 2026

Copy link
Copy Markdown
Member

Thanks for the PR. I checked the changes locally and ran the related/unit tests:

  • python -m pytest tests/unit/test_file_store_consistency.py tests/unit/test_search_step.py passed
  • python -m pytest tests/unit passed

The no-date path looks backward compatible in normal usage: when start_date and end_date are not provided, SearchStep does not add them to search_filter, LocalFileStore skips the date filter, and FAISS still uses the previous no-filter over-fetch path.

A couple of points would be good to tighten before merging:

  1. Please validate/normalize start_date and end_date before comparing them.

    Right now _matches_search_filter compares the raw filter values as strings. If a caller passes an invalid or non-normalized date such as 2026-2-28 or abc, the lexicographic comparison can produce incorrect results. For example, an invalid end_date may keep dates that should be excluded, and an invalid start_date may exclude all dated paths.

    We already have strict date helpers such as parse_daily_date; it would be better to reject invalid values in SearchStep with a clear error, or normalize them before they reach the file store.

  2. Please clarify the intended behavior for paths without an embedded date.

    Currently, when a date filter is provided, paths where _extract_date_from_path returns None are still retained, e.g. digest/personal/topic.md. This may be intentional because digest/ contains long-term memory, but it should be documented or explicitly tested as part of the expected semantics. If the intended meaning is “only return results within this time range”, then no-date paths should probably be excluded when a date filter is active.

  3. Minor robustness suggestion: _extract_date_from_path currently uses parts[1].split(".")[0], so a second path segment like 2026-05-18.anything would be accepted as 2026-05-18. It may be safer to only accept either an exact YYYY-MM-DD segment or the day-index form YYYY-MM-DD.md.

Overall the implementation direction looks reasonable, especially the progressive FAISS recall. I think the main thing to address is date input validation and making the no-date-path semantics explicit.

@jinliyl jinliyl self-requested a review July 3, 2026 06:55
@CLAassistant

CLAassistant commented Jul 3, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Address three code-review comments on the time_filter search feature:

1. Validate/normalize start_date and end_date before string comparison.
   _matches_search_filter does lexicographic comparison against path_date
   (always canonical YYYY-MM-DD). Raw caller values like '2026-2-28' or
   'abc' would produce silently wrong results. Now SearchStep normalizes
   valid dates via extract_daily_date (with strptime fallback for
   non-zero-padded input) and silently ignores invalid dates with a
   logger.warning, removing them from the filter.

2. Clarify behavior for paths without embedded dates.
   Added optional strict_date_filter parameter (default False). When True
   and at least one date bound is active, chunks whose path yields no date
   (e.g. digest/personal/topic.md) are excluded. When False (default),
   the existing behavior is preserved — dateless paths pass through.

3. Harden _extract_date_from_path against non-standard suffixes.
   Previously parts[1].split('.')[0] accepted '2026-05-18.anything' as a
   valid date. Now only exact 'YYYY-MM-DD' (dir) and 'YYYY-MM-DD.md'
   (day-index) forms are accepted.

@jinliyl jinliyl left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ployts ployts left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jinliyl jinliyl merged commit 43a407b into agentscope-ai:main Jul 3, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants