Skip to content

chore(release): v0.4.0 — Feature release#38

Merged
lilinoct18-coder merged 34 commits into
mainfrom
release/0.4.0
Feb 17, 2026
Merged

chore(release): v0.4.0 — Feature release#38
lilinoct18-coder merged 34 commits into
mainfrom
release/0.4.0

Conversation

@novis10813

Copy link
Copy Markdown
Owner

Summary\nPrepare v0.4.0 release: bump version, add changelog and release notes.\n\n## Key features\n- VectorizedBacktester: signal→exposure mapping, portfolio schemes (market‑neutral, long‑only, top‑N).\n- Multi‑horizon IC decay, factor correlation + clustering, orthogonalization utilities.\n- Additional backtest metrics (Sortino, Calmar, win rate) and example notebooks.\n\n## Checklist\n- [x] Bump version to 0.4.0 ()\n- [x] Add and \n- [ ] CI: run all tests & ensure coverage >= 80%\n- [ ] Merge & create GitHub Release + tag \n\n## Notes\nNo breaking API changes are expected for typical user workflows. Please review CI results and docs before merge.\n

novis10813 and others added 30 commits February 12, 2026 19:13
- 01_momentum_factor_research.ipynb: complete workflow (data → factor → IC → backtest)
- 02_mean_reversion_factor.ipynb: mean reversion + cross-sectional processing
- 03_data_loading_and_exploration.ipynb: AggBar + timebar deep dive
- 04_multi_factor_combination.ipynb: multi-factor correlation, combo, selection
- examples/README.md: index and getting started guide
- Add _ensure_data_prepared() helper to FactorAnalyzer that auto-calls
  prepare_data() when _clean_data is not yet set
- Replace ValueError raises in calculate_ic(), calculate_turnover(),
  and calculate_quantile_returns() with auto-prepare calls
- Fix notebook 01 to explicitly call prepare_data() before plotting
  quantile returns (best practice documentation)
- Clear stale error outputs from notebook cells
- Fix equity_curve column names: start_time → end_time, equity → total_value
- Fix returns column name: portfolio_return → return
- Fix AggBarMetadata usage: meta.num_symbols → len(meta.symbols)
- Fix Factor.plot API: .plot.plot_timeseries() → .plot()
- Add jupyterlab to dev dependencies
- Create docs/dev/safe-operations.md with formal documentation:
  - NaN propagation rules (strict vs. ignore)
  - Division safety (EPSILON threshold)
  - Window size requirements
  - Degenerate-case handling (constant values, zero variance)
  - Edge cases (empty data, all NaN, single-element window)
  - Summary table of all safety patterns

- Add tests/factors/test_safe_operations.py with 57 regression tests:
  - safe_divide: scalar, array, Series, edge cases
  - Factor division: Factor/Factor, Factor/scalar, rtruediv
  - Factor.inverse(): near-zero guard
  - ts_* NaN propagation and window completeness
  - ts_* degenerate cases (constant values, zero variance)
  - cs_* NaN propagation (cross-sectional strict mask)
  - cs_neutralize degenerate case
  - Math operations (log, sqrt domain safety)
  - Multi-symbol consistency
  - EPSILON threshold boundary tests

- Update AGENTS.md/GEMINI.md to reference formal docs
- Add safe-operations.md to mkdocs.yml navigation

Closes #10
When _calculate_date_range used datetime.now() directly (with arbitrary
time component like 09:32:22 UTC), the day-by-day processing loop in
load_aggbar would split the same 1-minute bucket across two adjacent
daily DuckDB queries. This produced duplicate bars with partial OHLCV
data at every daily boundary (e.g., every day at 09:32 UTC).

Fix: snap start/end to UTC midnight boundaries when no explicit dates
are provided. This ensures daily query boundaries always align with
bar interval bucket boundaries, eliminating duplicate bars.

Changes:
- loader.py: _calculate_date_range now uses today_midnight + 1 day as
  end and end - days as start
- downloader.py: same alignment for consistency
- tests: update freeze_time assertions to expect midnight-aligned dates,
  add test_midnight_alignment_regardless_of_time
docs: document and test safe_* operations semantics
feat: add example notebooks for common research workflows
…tion

- Create calculate_date_range() in data/utils.py for reusable date logic
- Remove duplicate _calculate_date_range() from BinanceDataLoader and BinanceDataDownloader
- Use shared get_market_string() from parquet.py instead of local implementation
- Update tests to call utility function directly instead of private methods
- Export calculate_date_range from data module public API

This addresses PR review feedback to consolidate duplicated time calculation logic.
fix: align daily processing boundaries to UTC midnight
restore(universe): re-apply universe/checklist integration reverted in #27
docs(examples): add universe/checklist workflow notebook
docs(universe): add universe/checklist user guide and integrations
test(data): remove cache_dir deprecation warnings in cache tests
- Move _run_async import to module level in metadata.py and tags.py
- Patch local module reference in tests to correctly mock _run_async
- Fix NameError in test_metadata.py due to missing import
fix: address review issues across date range, universe rules, and tag fetching
- Restore all src/ and tests/ from dev branch (merge conflict resolution
  had incorrectly reverted to main's older code)
- Fix 19 occurrences of dates.astype(np.int64) // 10**6 pattern in tests
  to use dates.astype('datetime64[ms]').astype(np.int64) for pandas 3.0
  compatibility (DatetimeIndex default resolution changed from ns to us)

@lilinoct18-coder lilinoct18-coder left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

@lilinoct18-coder lilinoct18-coder merged commit 0a6b2b6 into main Feb 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants