Skip to content

fix: address review issues across date range, universe rules, and tag fetching#37

Merged
lilinoct18-coder merged 10 commits into
devfrom
fix/gemini-review-fixes
Feb 17, 2026
Merged

fix: address review issues across date range, universe rules, and tag fetching#37
lilinoct18-coder merged 10 commits into
devfrom
fix/gemini-review-fixes

Conversation

@novis10813

Copy link
Copy Markdown
Owner

Summary

This PR addresses reviewer-raised correctness and safety issues in data range handling, universe filtering, and CoinGecko tag fetching behavior.

Core fixes

  • Make explicit end_date semantics inclusive-by-day in date range calculation by using [start, end + 1 day) for start_date + end_date inputs.
  • Harden MinListingAge to exclude assets with unknown listing dates by default (fill_null(False)), avoiding risky inclusion under missing metadata.
  • Require explicit symbols in TagProvider.fetch_async to prevent accidental full-database CoinGecko fetches.
  • Improve CoinGecko ticker collision handling by preferring canonical mapping when coin_id == symbol.lower().

Additional reliability updates (same branch scope)

  • Ensure date utilities align daily boundaries at UTC midnight.
  • Make sync metadata/tags fetch paths notebook-safe via async runner compatibility.
  • Re-prepare analyzer data when newly requested periods are missing.
  • Remove redundant mask reapplication in vectorized backtester.
  • Normalize 60_000 time literal style in safe-ops regression tests.

Why

  • Reduce boundary surprises for users who provide explicit date ranges.
  • Default to conservative filtering when critical metadata is missing.
  • Prevent long-running or blocking external fetch operations in research workflows.
  • Improve determinism/robustness in symbol-to-id resolution.

Test Plan

  • Run targeted suites for changed modules:
    • tests/test_data_loader.py::TestCalculateDateRange
    • tests/universe/test_universe_rules.py
    • tests/universe/test_tags.py
    • tests/factors/test_safe_operations.py
    • tests/factors/test_analyzer.py
    • tests/backtest/test_vectorized.py
    • tests/data/test_timestamp_utils.py
    • tests/universe/test_metadata.py

Note: In this local environment, pytest is currently blocked by interpreter mismatch (typing.Self import indicates Python 3.11+ requirement while runtime is 3.10). Please run the test plan in the project target Python version.

@lilinoct18-coder lilinoct18-coder left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這邊有些東西需要你說明和再次確認

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這兩件事情並不等價,需要查看 backtest 中處理 mask 的方式有沒有對應的改動

Comment thread src/factorium/universe/metadata.py Outdated

def fetch(self) -> dict[str, SymbolMetadata]:
return asyncio.run(self.fetch_async())
from ..data.loader import _run_async

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import 請寫在檔案的最上面

listing_map[sym] = int(listing_date)

if not listing_map:
return pl.lit(True)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要檢查這塊的邏輯,為什麼是把 True 直接改成 False?

Comment thread src/factorium/universe/tags.py Outdated
import aiohttp


COINGECKO_BASE_URL = "https://api.coingecko.com/api/v3"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這類固定的值也許可以額外使用一份檔案維護?
我記得在另外一份檔案裡面有定義固定的數值,也許可以參考那份檔案

Comment thread src/factorium/universe/tags.py Outdated

def fetch(self, symbols: list[str] | None = None) -> dict[str, list[str]]:
return asyncio.run(self.fetch_async(symbols=symbols))
from ..data.loader import _run_async

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import 一樣要放在檔案上方

@novis10813

Copy link
Copy Markdown
Owner Author

回覆 Review Feedback

針對兩個主要的技術疑慮,我們已經建立了完整的驗證程式來證明改動的正確性。


1. vectorized.py - mask reapplication 刪除的安全性

Reviewer 疑慮: 「這兩件事情並不等價,需要查看 backtest 中處理 mask 的方式」

驗證結果: ✅ 刪除是安全的,兩者行為等價

驗證方法:

  1. 數據流分析: 追蹤 _mask 在整個 backtest pipeline 中的應用
  2. 行為等價性測試: 建立 4 種 edge cases 比較新舊行為
  3. 回歸測試: 61 個現有測試全部通過

關鍵發現:

  • _mask_calculate_weights() 中已經被應用於 prev_signal → 產生 _masked_signal
  • Weights 是從已經被 mask 的 _masked_signal 計算出來的
  • 因此 weight 欄位已經被間接 mask(null signals → 0/null weights)
  • 再次 apply mask 到 weights 是冗餘操作(no-op)

驗證腳本: scripts/verify_mask_reapplication.py

Test Scenarios:
- all_valid: PASS (both versions produce identical outputs)
- partially_masked: PASS (both versions produce identical outputs)
- fully_masked: PASS (both versions produce identical outputs)
- with_nans: PASS (both versions produce identical outputs)

Conclusion: All scenarios equivalent

詳細分析: 見 docs/analysis/mask_data_flow.md


2. rules.py - 為什麼將 True 改成 False

Reviewer 疑慮: 「需要檢查這塊的邏輯,為什麼是把 True 直接改成 False」

驗證結果: ✅ 改為 False 是更安全的設計

核心原因 - 保守原則 (Conservative Principle):

  • 舊邏輯 (True): 沒有上市日期資料時,預設包含所有資產
  • 新邏輯 (False): 沒有上市日期資料時,預設排除所有資產

為什麼 False 更安全:

  1. 未知風險: 如果我們不知道資產的上市日期,無法判斷它是否適合交易
  2. 回測品質: 包含未上市或無效資產會導致錯誤的回測結果
  3. 保守設計: 金融系統應該採用「疑罪從有」原則 - 不確定就不交易

Edge Case 驗證:

情境 舊邏輯 (True) 新邏輯 (False) 評估
無上市資料 包含所有 (危險) 排除所有 (安全) 新邏輯更安全
部分資料缺失 包含缺失項 排除缺失項 新邏輯更嚴謹
完整資料 檢查日期 檢查日期 行為一致

驗證腳本: verify_listing_logic.py

Case 1: No listing metadata available
- Old logic: include all symbols (unsafe)
- New logic: exclude all (safe)

Case 2: Partial listing metadata
- Old logic: includes symbols without dates
- New logic: excludes them (conservative)

與其他 filters 的一致性:

  • 我們檢查了其他 filters(如 ExcludeStablecoins
  • 建議未來的 delisting_filter 也採用相同的保守原則

總結

這兩個改動都基於嚴謹的驗證保守的設計原則

  1. vectorized.py: 冗餘代碼刪除,經證明不影響行為
  2. rules.py: 從寬鬆改為保守,提升回測品質和安全性

驗證腳本已提交到 PR 中,可以執行驗證:

uv run python scripts/verify_mask_reapplication.py
uv run python verify_listing_logic.py

- Move _run_async import to module level in metadata.py and tags.py
- Patch local module reference in tests to correctly mock _run_async
- Fix NameError in test_metadata.py due to missing import

@lilinoct18-coder lilinoct18-coder left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead

@lilinoct18-coder lilinoct18-coder merged commit 696112d into dev Feb 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants