Skip to content

Add EDM delay embedding infrastructure and tests#1528

Open
aman-raj-srivastva wants to merge 17 commits into
NOAA-FIMS:mainfrom
aman-raj-srivastva:feature/edm-distribution-class
Open

Add EDM delay embedding infrastructure and tests#1528
aman-raj-srivastva wants to merge 17 commits into
NOAA-FIMS:mainfrom
aman-raj-srivastva:feature/edm-distribution-class

Conversation

@aman-raj-srivastva

@aman-raj-srivastva aman-raj-srivastva commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR implements the initial delay embedding infrastructure for Empirical Dynamic Modeling (EDM) in FIMS as part of the GSoC 2026 project: “Add empirical dynamic model to FIMS”

This PR completes the main deliverables for:


What this PR adds

Core Delay Embedding Infrastructure

Implemented a reusable delay embedding generator for univariate time series in:

inst/include/edm/functors/delay_embedding.hpp

Features include:

  • configurable embedding dimension (E)
  • configurable time lag (tau)
  • row-major embedding matrix generation
  • target-time indexing
  • missing observation handling (-999 sentinel support)
  • input validation and error handling
  • Doxygen documentation

Rcpp Interface

Added EDM Rcpp integration in:

inst/include/interface/rcpp/rcpp_objects/rcpp_edm.hpp

This includes:

  • DelayEmbeddingInterface
  • R-accessible embedding construction
  • missing-value filtering support
  • module registration through RCPP_MODULE(fims)

FIMSFrame Integration

Extended FIMSFrame to support EDM embeddings by adding:

  • edm_embeddings storage slot
  • helper/accessor functions
  • embedding creation utilities
  • matrix conversion helpers for future EDM prediction workflows

Testing

C++ (GoogleTest)tests/gtest/test_edm_delay_embedding.cpp
6 test cases covering: basic 2D/3D embeddings, configurable lag,
short-series errors, invalid E/tau, and missing-value filtering.

R — Rcpp interfacetests/testthat/test-rcpp-edm.R
3 test blocks, 18 assertions covering: construct(), construct_drop_missing(),
and element access via at().

R — FIMSFrametests/testthat/test-edm-fimsframe.R
6 test blocks covering: empty slot default, missing embedding error,
input validation, metadata correctness, matrix shape, and chaining.


Additional Work

  • Added TODO(EDM) comments across key FIMS workflow files to guide future EDM and likelihood integration.
  • Added "edm" support to fims_input_types.

Checklist

  • C++ implementation with Doxygen docs
  • Rcpp interface following existing FIMS patterns
  • FIMSFrame slot, accessors, and constructor updated
  • fims_input_types allowlist updated
  • GoogleTest unit tests pass
  • testthat integration tests pass
  • Uncertainty propagation (deferred — pending mentor input)

Related Issue

Closes #1484Implement Delay Embedding Generator for Time Series


Instructions for code reviewer

👋Hello reviewer👋, thank you for taking the time to review this PR!

  • Please use this checklist during your review, checking off items that you have verified are complete but feel free to skip over items that are not relevant!
  • See the GitHub documentation for how to comment on a PR to indicate where you have questions or changes are needed before approving the PR.
  • Please use standard conventional messages for both commit messages and comments
  • PR reviews are a great way to learn so feel free to share your tips and tricks. However, when suggesting changes to the PR that are optional please include nit: (for nitpicking) as the comment type. For example, nit: I prefer using a data.frame() instead of a matrix because ...
  • Engage with the developer. Make it clear when the PR is approved by selecting the approved status, and potentially commenting on the PR with something like This PR is now ready to be merged.

Checklist

  • The code is well-designed
  • The code is designed well for both users and developers
  • Code coverage remains high- [ ] Comments are clear, useful, and explain why instead of what
  • Code is appropriately documented (doxygen and roxygen)

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🎨 Chore: code formatting workflow

Our automated workflows cannot run on forks because of permission issues, and thus, we ask that you run the following code locally and push any changes that are created to your feature branch. You will only be reminded of this once per PR. Thank you!

Format C++ code

  1. Install clang-format version 18.0.0
  2. Run the following command from the repository root:
    clang-format -i --style="{BasedOnStyle: Google, SortIncludes: false}" $(find ./inst/include ./src ./tests/gtest -name "*.hpp" -o -name "*.cpp")

Format R code

  1. Install {styler} and {roxygen2}
  2. Run the following commands in R from the repository root:
styler::style_pkg() # Style R code
roxygen2::roxygenise() # Update documentation
styler::style_pkg() # Style R code again
roxygen2::roxygenise() # Update documentation again
usethis::use_tidy_description() # Style DESCRIPTION file

Push changes

  1. Commit the formatting with a commit message of "Chore: format feature branch"
  2. Push to your fork

@aman-raj-srivastva

Copy link
Copy Markdown
Contributor Author

On the uncertainty propagation, my assumption for now is to propagate observational uncertainty metadata (e.g. std error / variance vectors) along with the lag-coordinate embeddings, so later prediction and likelihood components can utilize them if needed. I just wanted to confirm if this direction makes sense regarding EDM / FIMS integration, before I move ahead with anything more elaborate.

Also, I saw that my branch has merge conflicts with main, apparently due to some of the recent upstream interface and module level refactors in module registration etc. Should I just rebase my branch onto the latest upstream, or is there a desired synchronization / target branch policy for EDM development branches going forwards?

@kellijohnson-NOAA

Copy link
Copy Markdown
Contributor

@aman-raj-srivastva sorry for the merge conflicts. Yes, rebasing to main is the best strategy. If you have problems, feel free to let me know and I can help out.

@aman-raj-srivastva

Copy link
Copy Markdown
Contributor Author

Thanks! I’ll go ahead and rebase onto the latest main and work through the conflicts. I’ll definitely reach out if I run into any issues during the rebase.

Aman and others added 6 commits June 9, 2026 21:59
added the comments to the following files:
	modified:   R/distribution_formulas.R
	modified:   R/fimsfit.R
	modified:   R/initialize_modules.R
	modified:   inst/include/common/information.hpp
	modified:   inst/include/common/model.hpp
	modified:   inst/include/distributions/functors/density_components_base.hpp
	modified:   inst/include/distributions/functors/normal_lpdf.hpp
	modified:   inst/include/interface/rcpp/rcpp_objects/rcpp_distribution.hpp
	modified:   src/FIMS.cpp
- Created rcpp_edm.hpp defining EDMInterfaceBase and DelayEmbeddingInterface wrapper classes.

- Exposed the construct(), construct_drop_missing(), and at() methods to R.

- Exposed fields/metadata (such as embedding_dimension, time_lag, n_rows, n_cols, values, and target_indices) to R.

- Registered the DelayEmbedding class in the RCPP_MODULE(fims) block within fims_modules.hpp.
- Add edm_embeddings list slot to the FIMSFrame S4 class to store
  delay-embedding results produced by create_edm_embedding()
- Add create_edm_embedding() helper that pulls a named time series
  from the data slot, calls the Rcpp DelayEmbedding module, and
  returns a new FIMSFrame with the result stored by name
- Add get_edm_embeddings() accessor following the existing get_*()
  pattern in fimsframe.R
- Add model_edm_matrix() accessor that returns a numeric matrix
  (n_rows x E) ready to pass to an EDM prediction module, following
  the existing model_*() pattern
- Register 'edm' as a valid FIMSFrame input type in
  data-raw/fims_input_types.R and regenerate data/fims_input_types.rda
- Add tests/testthat/test-edm-fimsframe.R with 6 test blocks covering
  default empty slot, error handling, input validation, metadata
  correctness, matrix shape, and chaining multiple embeddings

This completes proposal deliverable: 'Create structures in FIMSFrame
for embedded matrices' (Weeks 1-3).
@aman-raj-srivastva aman-raj-srivastva force-pushed the feature/edm-distribution-class branch from 7e64860 to dd00e76 Compare June 9, 2026 20:11
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.04663% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.50%. Comparing base (4a2d5f3) to head (6a5994a).

Files with missing lines Patch % Lines
R/fimsframe.R 79.20% 26 Missing ⚠️
...t/include/interface/rcpp/rcpp_objects/rcpp_edm.hpp 84.17% 22 Missing ⚠️
inst/include/edm/functors/delay_embedding.hpp 96.55% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1528      +/-   ##
==========================================
- Coverage   87.53%   87.50%   -0.03%     
==========================================
  Files          97      101       +4     
  Lines        8749     9134     +385     
  Branches      523      559      +36     
==========================================
+ Hits         7658     7993     +335     
- Misses       1054     1103      +49     
- Partials       37       38       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread R/fimsframe.R Outdated
Comment thread inst/include/edm/functors/delay_embedding.hpp Outdated
Comment thread inst/include/edm/functors/delay_embedding.hpp Outdated
Comment thread inst/include/edm/functors/delay_embedding.hpp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Delay Embedding Generator for Time Series

3 participants