Skip to content

bigbases/TSDecomp

Repository files navigation

STFD: Sliding Time-Frequency Decomposition

Reference implementation and experiment suite for STFD (Sliding Time-Frequency Decomposition), a decomposition-based preprocessing method for time series forecasting. The code is built on top of the official codebase of Unpacking the Trend (Kreuzer et al., DMKD 2025), extended with STFD and the additional baselines reported in the paper.

The forecasting backbones (DLinear, LSTM, TimeMixer, iTransformer) follow the standard preprocessing-then-forecast pipeline: a series is first decomposed into per-timestep features, and a backbone is trained on those features.

Quick start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Prepare datasets (see the "Datasets" section below).
#    Files are expected under data/UTS/ (univariate) and data/MTS/ (multivariate).

# 3. Run a single experiment (manual mode)
python exp_local.py --algorithm iTransformer --dataset ett_h1 --decomp stfd \
    --stfd_window_size 12 --stfd_kernel_size 13 --seed 0 --save

# 4. Run the full grid (batch mode)
python exp_local.py --batch

Method

STFD turns a univariate signal X of length L into a stack of per-timestep features (paper Algorithm 1):

  1. Trend T = MovingAverage(X, kernel=k).
  2. Seasonal S = X - T.
  3. For each sliding window of S (size w, stride 1), take the FFT within the window (dim=-1) and keep the first w//2 + 1 bins (a real signal's FFT is conjugate-symmetric, so the remaining bins are redundant), split into real and imaginary parts.
  4. Align X, T, S with the windows and concatenate [X, T, S, real, imag] as the per-timestep feature vector.

In the default rfft mode the feature count is n_features = 3 + 2 * (w//2 + 1) = w + 5. Multivariate inputs are handled channel-independently (each variable is preprocessed separately), matching the Unpacking the Trend framework. Because STFD features are not additive, the backbone output's first component (X) is used as the forecast, via a dedicated reconstruction wrapper.

Directory structure

.
├── README.md                       # this file
├── requirements.txt
│
├── config.py                       # dataset / decomposition settings (STFD params included)
├── data_reading.py                 # dataset loaders
├── forecasting_dataset.py          # decomposition dispatch + train/val/test windowing
├── model_config.py
├── metrics.py
├── utility.py
│
├── decomposition/
│   ├── moving_average.py           # local: moving-average decomposition
│   ├── fourier_bandlimited.py      # local: band-limited Fourier
│   ├── fourier_topk.py             # local: top-k Fourier
│   ├── wavelet.py                  # local: wavelet
│   ├── ewt.py                      # local: Empirical Wavelet Transform
│   ├── modwt.py                    # local: MODWT (via pywt.swt)
│   ├── vmd_fixed.py                # local: VMD with fixed K
│   ├── stfd.py                     # STFD (the proposed method)
│   ├── decomposition_pytorch.py    # in-graph moving-average (used by DLinear)
│   └── global_baselines.py         # global wrappers: EMD/EEMD/CEEMDAN/VMD/EWT/SSA
│
├── models/
│   ├── DLinear/, LSTM/, TimeMixer/, iTransformer/   # forecasting backbones
│   ├── input_decomposition_wrapper.py  # adds the 'first' reconstruction mode (for STFD)
│   ├── create_model.py             # builds backbone + decomposition wrapper
│   ├── train_model.py              # training / validation / prediction loop
│   └── losses.py, masking.py, tools.py
│
└── # ----- Experiments (main results) -----
    ├── exp_local.py                # local comparison (STFD vs local baselines, leakage-free)
    └── exp_global_ensemble.py      # global decomposition-ensemble (EMD-LSTM family, with leakage)

Usage

Manual mode (parameter tuning, debugging)

# STFD, single run
python exp_local.py --algorithm iTransformer --dataset ett_h1 --decomp stfd \
    --stfd_window_size 12 --stfd_kernel_size 13 --stfd_padding_mode replicate --seed 0

# Compare against other local decompositions
python exp_local.py --algorithm iTransformer --dataset ett_h1 --decomp wavelet --seed 0
python exp_local.py --algorithm iTransformer --dataset ett_h1 --decomp none --seed 0

# Persist results
python exp_local.py ... --save

STFD-specific flags: --stfd_window_size (w), --stfd_kernel_size (k), --stfd_padding_mode (none | reflect | cyclic | replicate), and --stfd_use_full_fft (0/1). Defaults are taken from config.py.

Batch mode (full grid)

# Every (algorithm, dataset, decomposition) combination
python exp_local.py --batch

# A specific subset
python exp_local.py --batch --algorithms iTransformer TimeMixer \
    --datasets ett_h1 exchange_rate illness --decomps stfd wavelet moving_avg \
    --n_seeds 3

The local comparison defaults to four backbones (DLinear, LSTM, TimeMixer, iTransformer) and nine decompositions (none, moving_avg, fourier_bandlimited, fourier_topk, wavelet, ewt, modwt, vmd_fixed, stfd).

STFD hyperparameter ablation

python exp_local.py --ablation_stfd --datasets ett_h1 --algorithms iTransformer

Global decomposition-ensemble (with leakage caveat)

The global protocol decomposes the full series before splitting (so it contains data leakage by construction) and trains one backbone per IMF, summing the forecasts — the canonical EMD-LSTM / VMD-LSTM setup.

python exp_global_ensemble.py --batch
python exp_global_ensemble.py --batch --decomps emd_global vmd_global ceemdan_global

Decomposition is seed-independent and expensive (e.g. CEEMDAN), so results are cached under cache_global_decomp/. Set STFD_GLOBAL_WORKERS=N to parallelise the per-series decompositions across N processes.

Inspecting results

Results are written as pickle files under results/ (see the "Result format" section below for the exact structure).

Datasets

Place dataset files under data/UTS/ (univariate) and data/MTS/ (multivariate). The default batch run evaluates the datasets listed in config.py (dataset_names):

Univariate: m4_h (M4 Hourly), m4_m (M4 Monthly), m4_y (M4 Yearly).

Multivariate: ett_h1, ett_h2, ett_m1, ett_m2 (ETT), weather, exchange_rate, illness.

data_reading.py supports additional datasets (e.g. m4_w, m4_d, m4_q, nn5, tourism, m3_*) that can be requested explicitly via --datasets. If you add a new dataset to dataset_names, also extend the periods, stride_lengths, d_model, and d_ff maps in config.py.

To keep runtimes manageable, the large M4 subsets (m4_m, m4_d, m4_q) are deterministically sub-sampled to 2,000 series (read_m4(path, max_series=2000), random_state=42). Set max_series=None in data_reading.py to use the full sets.

Dataset sources: ETT (https://github.com/zhouhaoyi/ETDataset), Exchange/Weather/Illness (https://github.com/thuml/Autoformer), M4 (https://github.com/Mcompetitions/M4-methods), and the Monash time series forecasting repository for the remaining series.

Result format

results/exp_local/{algorithm}/{dataset}.pkl
  -> {decomp_name: {mse, mae, mape, smape, mase, owa, std_*, n_params,
                    train_time, inference_time, effective_input_length,
                    horizon, backhorizon, decomp_params}}

results/exp_global_ensemble/{algorithm}/{dataset}.pkl
  -> same shape, plus 'is_global', 'is_ensemble', 'n_imf_trained',
     'n_imf_intended', and 'leakage_caveat'

Implementation notes

The sliding FFT is taken within each window (dim=-1). The original upstream code applied the FFT along the window-position axis (dim=1); that is corrected here so the transform matches a proper short-time Fourier transform.

With padding_mode='none' the effective input length shrinks to L - w + 1 (the paper's original behaviour). The padding modes (reflect, cyclic, replicate) preserve length L, which keeps the input comparable to the other baselines; replicate is used for the paper's main results.

Requirements

See requirements.txt. Beyond the upstream dependencies, the additional baselines require ewtpy (EWT), vmdpy (VMD), and EMD-signal (EEMD/CEEMDAN).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages