Variable series length support for foundation models by Kurokabe · Pull Request #3125 · unit8co/darts

Kurokabe · 2026-05-29T13:49:33Z

Here is my draft PR to support variable-length fine-tuning and inference on foundation models.

The main changes are:

VariableLengthTorchTrainingDataset (new class in training_dataset.py): a TorchTrainingDataset subclass that accepts series shorter than input_chunk_length by left-padding the past window with NaN. This allows fit_from_dataset() to handle heterogeneous datasets without requiring per-window input_chunk_length tuning or silently dropping short series. Covariates and sample weights are intentionally not supported for now.
FoundationModel._build_inference_dataset override (new method in foundation_model.py): transparently left-pads short series with NaN before passing them to SequentialTorchInferenceDataset, so that predict() works on short series without any manual pre-processing from callers. The padding logic mirrors what VariableLengthTorchTrainingDataset does during training.

Note that for now, only inference has been tested end-to-end. dev_fev_tasks_mini_validation.ipynb and fev_tasks_mini.yaml are development artifacts I've included in case you want to reproduce the validation runs, they will be removed before merging.

One thing I can't fully explain: the notebook compares three approaches. Step 1 (adaptive input_chunk_length, window-by-window) produces different results than steps 2 and 3. Step 2 uses a fixed input_chunk_length=32 with manual NaN pre-padding before fit(), processed window-by-window. Step 3 uses VariableLengthTorchTrainingDataset with the same fixed input_chunk_length=32 in a single pass over all series. Steps 2 and 3 match each other exactly, which validates that VariableLengthTorchTrainingDataset is equivalent to manual pre-padding. But I can't explain why step 1 produces different outputs, since the only difference is the input_chunk_length value used per window, it's likely a context-length effect rather than a batching artefact, but I'm not certain. Do you have any insight on this?

One idea I had for a potential follow-up: instead of a dedicated VariableLengthTorchTrainingDataset, we could relax the short-series validation in ShiftedTorchTrainingDataset (currently a hard error in _get_end_of_output_idx) and handle the NaN padding in a collate_fn passed to the DataLoader.

Let me know what you think :)

…gth inputs and pre-pad smaller inputs during inference for foundation models

review-notebook-app · 2026-05-29T13:49:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2026-05-29T14:08:42Z

Codecov Report

❌ Patch coverage is 19.73684% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.15%. Comparing base (40af46d) to head (e71a8e2).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
...arts/utils/data/torch_datasets/training_dataset.py	9.23%	59 Missing ⚠️
darts/models/forecasting/foundation_model.py	81.81%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3125      +/-   ##
==========================================
- Coverage   96.54%   96.15%   -0.39%     
==========================================
  Files         160      160              
  Lines       17261    17361     +100     
==========================================
+ Hits        16664    16693      +29     
- Misses        597      668      +71

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wip: Add VariableLengthTorchTrainingDataset to authorize variable len…

e71a8e2

…gth inputs and pre-pad smaller inputs during inference for foundation models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable series length support for foundation models#3125

Variable series length support for foundation models#3125
Kurokabe wants to merge 1 commit into
masterfrom
variable_length_dataset

Kurokabe commented May 29, 2026

Uh oh!

review-notebook-app Bot commented May 29, 2026

Uh oh!

codecov Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kurokabe commented May 29, 2026

Uh oh!

review-notebook-app Bot commented May 29, 2026

Uh oh!

codecov Bot commented May 29, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant