Skip to content

longmemeval#316

Closed
xyf2020 wants to merge 12 commits into
agentscope-ai:mainfrom
xyf2020:main
Closed

longmemeval#316
xyf2020 wants to merge 12 commits into
agentscope-ai:mainfrom
xyf2020:main

Conversation

@xyf2020

@xyf2020 xyf2020 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

xyf2020 added 3 commits July 1, 2026 23:56
…ate injection

- Add evaluation/longmemeval/ with run.py, config.yaml, and test scripts
- Add reme/config/longmemeval.yaml for evaluation-specific model config
- Add tool_defaults mechanism to as_agent_wrapper for injecting default
  tool kwargs (uses setdefault so LLM-provided values take priority)
- Pass tool_defaults={'daily_write': {'date': day}} in auto_memory to
  ensure notes always use the correct historical date
- Add timestamp interpolation (_interpolate_timestamps) in auto_memory
  for filling missing created_at fields via linear interpolation
- Evaluation pipeline: ingest sessions -> dream -> search -> answer -> judge
- Uses qwen3.6-flash for memory, qwen3.7-max for answer/judge
@CLAassistant

CLAassistant commented Jul 2, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

xyf2020 and others added 9 commits July 2, 2026 14:47
…eval runner

- Replace async execution with synchronous + multiprocessing for parallel item evaluation - Add filter_future_sessions option to only ingest sessions <= question date - Add question_types filtering in config - Add result summary with binary accuracy and avg score - Update config defaults (oracle variant, 50 items, 32 workers) - Minor code style fixes in agent_wrapper and auto_memory
- Add BenchQueryStep using agent_wrapper with search job tool
- Replace manual search+LLM answer in run.py with bench_query_job
- Remove unused answer LLM config from longmemeval.yaml
- Register benchmark step module in steps/__init__.py
- Add _extract_date_from_path to extract validated YYYY-MM-DD from chunk paths
- Add start_date/end_date filtering in _matches_search_filter
- Implement progressive recall in FaissLocalFileStore.vector_search
- Promote start_date/end_date from context to search_filter in SearchStep
- Add start_date/end_date parameters to search job in default.yaml
- Add unit tests for date filter functionality
Address three code-review comments on the time_filter search feature:

1. Validate/normalize start_date and end_date before string comparison.
   _matches_search_filter does lexicographic comparison against path_date
   (always canonical YYYY-MM-DD). Raw caller values like '2026-2-28' or
   'abc' would produce silently wrong results. Now SearchStep normalizes
   valid dates via extract_daily_date (with strptime fallback for
   non-zero-padded input) and silently ignores invalid dates with a
   logger.warning, removing them from the filter.

2. Clarify behavior for paths without embedded dates.
   Added optional strict_date_filter parameter (default False). When True
   and at least one date bound is active, chunks whose path yields no date
   (e.g. digest/personal/topic.md) are excluded. When False (default),
   the existing behavior is preserved — dateless paths pass through.

3. Harden _extract_date_from_path against non-standard suffixes.
   Previously parts[1].split('.')[0] accepted '2026-05-18.anything' as a
   valid date. Now only exact 'YYYY-MM-DD' (dir) and 'YYYY-MM-DD.md'
   (day-index) forms are accepted.
@xyf2020 xyf2020 closed this Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants