The BLAST dataset is a large-scale time-series corpus created specifically for Universal Forecasting Models. Thanks to its rich and diverse samples, BLAST enables faster convergence, notable reductions in computational cost, and superior performance even with limited resources. The BLAST data is now available at huggingface.
BasicTS provides native support for BLAST and can be used to train models such as TimeMoE (decoder-only architecture) and ChronosBolt (encoder-decoder architecture). Before you start, follow the BasicTS README to clone the codebase and set up the environment.
Run the following commands from the BasicTS root directory to download BLAST from Hugging Face:
cd /path/to/BasicTS
huggingface-cli download ZezhiShao/BLAST \
--repo-type dataset \
--local-dir ./datasets/BLASTAfter the download finishes, the data will be under datasets/BLAST.
Because Hugging Face limits single-file sizes, the raw dataset is split into multiple shards. Use the official script to merge them and convert the data into the format expected by BasicTS:
cd /path/to/BasicTS
python scripts/data_preparation/BLAST/merge_data.py --clean_cache TrueAfter merging, the directory becomes:
datasets/BLAST
├── train # training set
│ ├── data.dat # memory-mapped sequence data
│ └── shape.npy # (19800000, 4096)
├── valid # validation set
│ ├── data.dat
│ └── shape.npy # (200000, 4096)
data.dat: floating-point sequences stored with NumPy memory mapping.shape.npy: a 2-element NumPy array giving (number of samples, sequence length).
BLAST is an float32.
Important
NaN values from the original data are preserved. Handle these NaNs appropriately when loading the data, according to the requirements of your model.
See the data loaders for TimeMoE and ChronosBolt for examples.
BasicTS currently supports training TimeMoE and ChronosBolt on BLAST.
-
Download the Pre-trained Weights
If you have network restrictions, you can switch to a mirror endpoint (commented in the example):
# Example for a mainland-China mirror (uncomment to use) # export HF_ENDPOINT=https://hf-mirror.com cd /path/to/BasicTS huggingface-cli download autogluon/chronos-bolt-base \ --repo-type model \ --local-dir ./baselines/ChronosBolt/ckpt/chronos-bolt-base/ huggingface-cli download autogluon/chronos-bolt-small \ --repo-type model \ --local-dir ./baselines/ChronosBolt/ckpt/chronos-bolt-small/
Important
These checkpoints are provided solely for convenient random-weight initialization and are not intended for further training on pre-trained weights.
-
Start Training
The example below trains the Chronos-small configuration on eight GPUs. To train the base version, uncomment the corresponding line.
python experiments/train.py \ -c baselines/ChronosBolt/config/chronos_small.py \ -g '0,1,2,3,4,5,6,7' # python experiments/train.py \ # -c baselines/ChronosBolt/config/chronos_base.py \ # -g '0,1,2,3,4,5,6,7'
- Download the Pre-trained Weights
# If your network is restricted, switch to a mirror first:
# export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download Maple728/TimeMoE-50M --repo-type model \
--local-dir ./baselines/TimeMoE/ckpt/TimeMoE-50M
huggingface-cli download Maple728/TimeMoE-200M --repo-type model \
--local-dir ./baselines/TimeMoE/ckpt/TimeMoE-200M- Start Training
# Train TimeMoE-base on 8 GPUs
python experiments/train.py \
-c baselines/TimeMoE/config/timemoe_base.py \
-g '0,1,2,3,4,5,6,7'
# To train the large variant, uncomment the line below
# python experiments/train.py \
# -c baselines/TimeMoE/config/timemoe_large.py \
# -g '0,1,2,3,4,5,6,7'Tip
- The default config is tuned for BLAST (batch size, learning rate, and other hyper-parameters). Adjust them if your GPUs or hardware differ.
If you encounter any issues, feel free to open an issue on BasicTS Issues. Happy training!