feat: add start_date/end_date time filter support for search job#317
Conversation
xyf2020
commented
Jul 2, 2026
- Add _extract_date_from_path to extract validated YYYY-MM-DD from chunk paths
- Add start_date/end_date filtering in _matches_search_filter
- Implement progressive recall in FaissLocalFileStore.vector_search
- Promote start_date/end_date from context to search_filter in SearchStep
- Add start_date/end_date parameters to search job in default.yaml
- Add unit tests for date filter functionality
- Add _extract_date_from_path to extract validated YYYY-MM-DD from chunk paths - Add start_date/end_date filtering in _matches_search_filter - Implement progressive recall in FaissLocalFileStore.vector_search - Promote start_date/end_date from context to search_filter in SearchStep - Add start_date/end_date parameters to search job in default.yaml - Add unit tests for date filter functionality
|
Thanks for the PR. I checked the changes locally and ran the related/unit tests:
The no-date path looks backward compatible in normal usage: when A couple of points would be good to tighten before merging:
Overall the implementation direction looks reasonable, especially the progressive FAISS recall. I think the main thing to address is date input validation and making the no-date-path semantics explicit. |
Address three code-review comments on the time_filter search feature:
1. Validate/normalize start_date and end_date before string comparison.
_matches_search_filter does lexicographic comparison against path_date
(always canonical YYYY-MM-DD). Raw caller values like '2026-2-28' or
'abc' would produce silently wrong results. Now SearchStep normalizes
valid dates via extract_daily_date (with strptime fallback for
non-zero-padded input) and silently ignores invalid dates with a
logger.warning, removing them from the filter.
2. Clarify behavior for paths without embedded dates.
Added optional strict_date_filter parameter (default False). When True
and at least one date bound is active, chunks whose path yields no date
(e.g. digest/personal/topic.md) are excluded. When False (default),
the existing behavior is preserved — dateless paths pass through.
3. Harden _extract_date_from_path against non-standard suffixes.
Previously parts[1].split('.')[0] accepted '2026-05-18.anything' as a
valid date. Now only exact 'YYYY-MM-DD' (dir) and 'YYYY-MM-DD.md'
(day-index) forms are accepted.