fix: address review issues across date range, universe rules, and tag fetching#37
Conversation
lilinoct18-coder
left a comment
There was a problem hiding this comment.
這邊有些東西需要你說明和再次確認
There was a problem hiding this comment.
這兩件事情並不等價,需要查看 backtest 中處理 mask 的方式有沒有對應的改動
|
|
||
| def fetch(self) -> dict[str, SymbolMetadata]: | ||
| return asyncio.run(self.fetch_async()) | ||
| from ..data.loader import _run_async |
There was a problem hiding this comment.
import 請寫在檔案的最上面
| listing_map[sym] = int(listing_date) | ||
|
|
||
| if not listing_map: | ||
| return pl.lit(True) |
There was a problem hiding this comment.
需要檢查這塊的邏輯,為什麼是把 True 直接改成 False?
| import aiohttp | ||
|
|
||
|
|
||
| COINGECKO_BASE_URL = "https://api.coingecko.com/api/v3" |
There was a problem hiding this comment.
這類固定的值也許可以額外使用一份檔案維護?
我記得在另外一份檔案裡面有定義固定的數值,也許可以參考那份檔案
|
|
||
| def fetch(self, symbols: list[str] | None = None) -> dict[str, list[str]]: | ||
| return asyncio.run(self.fetch_async(symbols=symbols)) | ||
| from ..data.loader import _run_async |
There was a problem hiding this comment.
import 一樣要放在檔案上方
回覆 Review Feedback針對兩個主要的技術疑慮,我們已經建立了完整的驗證程式來證明改動的正確性。 1. vectorized.py - mask reapplication 刪除的安全性Reviewer 疑慮: 「這兩件事情並不等價,需要查看 backtest 中處理 mask 的方式」 驗證結果: ✅ 刪除是安全的,兩者行為等價 驗證方法:
關鍵發現:
驗證腳本: 詳細分析: 見 2. rules.py - 為什麼將 True 改成 FalseReviewer 疑慮: 「需要檢查這塊的邏輯,為什麼是把 True 直接改成 False」 驗證結果: ✅ 改為 False 是更安全的設計 核心原因 - 保守原則 (Conservative Principle):
為什麼 False 更安全:
Edge Case 驗證:
驗證腳本: 與其他 filters 的一致性:
總結這兩個改動都基於嚴謹的驗證和保守的設計原則:
驗證腳本已提交到 PR 中,可以執行驗證: uv run python scripts/verify_mask_reapplication.py
uv run python verify_listing_logic.py |
- Move _run_async import to module level in metadata.py and tags.py - Patch local module reference in tests to correctly mock _run_async - Fix NameError in test_metadata.py due to missing import
Summary
This PR addresses reviewer-raised correctness and safety issues in data range handling, universe filtering, and CoinGecko tag fetching behavior.
Core fixes
end_datesemantics inclusive-by-day in date range calculation by using[start, end + 1 day)forstart_date + end_dateinputs.MinListingAgeto exclude assets with unknown listing dates by default (fill_null(False)), avoiding risky inclusion under missing metadata.symbolsinTagProvider.fetch_asyncto prevent accidental full-database CoinGecko fetches.coin_id == symbol.lower().Additional reliability updates (same branch scope)
60_000time literal style in safe-ops regression tests.Why
Test Plan
tests/test_data_loader.py::TestCalculateDateRangetests/universe/test_universe_rules.pytests/universe/test_tags.pytests/factors/test_safe_operations.pytests/factors/test_analyzer.pytests/backtest/test_vectorized.pytests/data/test_timestamp_utils.pytests/universe/test_metadata.py