Dump configuration next to the output by gtrevisan · Pull Request #522 · MIT-PSFC/disruption-py

gtrevisan · 2026-02-12T21:14:24Z

this should help for reproducibility:

dump configuration next to the outputs as JSON,
drop tests structure (if not testing),
drop attributes structure (as nan and inf are supported by TOML but not by JSON, and they should be stored as metadata anyways),
add a debug statement.

in the future we might want to add this back in as xarray metadata, rather than a configuration dump.

Copilot

Pull request overview

This PR enhances reproducibility by dumping the configuration as JSON next to the output files. The changes filter out sensitive data (passwords), remove test-specific configuration when not running tests, and exclude the attributes structure that contains TOML-specific values (nan/inf) not compatible with JSON.

Changes:

Added filter_dict utility function to recursively filter dictionary keys containing a specified substring
Integrated configuration dumping in the main workflow to save sanitized config as JSON alongside outputs
Added debug logging for the config dump location

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
disruption_py/workflow.py	Added imports and logic to dump sanitized configuration as JSON to temporary folder, filtering passwords and conditionally removing test data
disruption_py/core/utils/misc.py	Added filter_dict utility function to recursively remove dictionary keys containing a specified substring

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

disruption_py/core/utils/misc.py

disruption_py/workflow.py

yumouwei

I'm able to generate the config.json file with output_setting='dataset' or 'dataframe'. However, currently it does not work if I specify the file path (e.g. output_setting="<path_to_file>/temp.h5" or other format); it will only generate the dataset file but not the config file in the specified folder.

Additionally, it would also be idea to set the name of the config file to something like <dataset_file_name>.json or <dataset_file_name>_config.json so that a user can distinguish them in case there are multiple dataset files stored in the same folder.

zapatace · 2026-02-16T21:34:09Z

I tested different things...

First, I used to save files using output_setting and path. And I found that the json and log file are always saved in /tmp folder. Is that the expected behavior?
For example:

from disruption_py.workflow import get_shots_data

shot_data = get_shots_data(
    tokamak="cmod",
    shotlist_setting=[1050420019],
    output_setting="./test.nc",
)

saves a file "test.nc” on the path expecified in output_setting but the json and log files are saved in the user's /tmp/. That's the problem @yumouwei had here #522 (review).

Second, I also notice that if you rerun last code, you have the expected error

output_setting.py", line 282, in to_disk
    raise FileExistsError(f"File already exists! {self.path}")

and no .nc file of course. However, this type of error is not registered in the log file.

Finally, I have a suggestion. I tried different time settings, shot_lists, and output settings… It would be nice that any of these options in RetrievalSettings() or get_shots_data() are saved in the output.json, so even if the code is lost by the user, another user in the future can reproduce the exact data retrieval.

gtrevisan · 2026-02-17T15:32:25Z

at present we automatically generate a unique temporary work folder where everything gets dumped by default:

output.log (old)
output.nc (old)
config.json (new)

if one chooses to save the netcdf elsewhere, obviously the result is less consistent, eg log file and config json still get generated in the temporary folder.

we should first design how we would like the framework to behave in all these edge cases, and then implement it with a (not crazy, but consistent) overhaul, which I'd say at the moment is not a priority.

for context: back in the day I implemented the unique temporary folder choice in order to have a place for all test executions to drop files and log files and to be able to debug them better, because previously nothing was saved.

it does not work if I specify the file path [...]; it will only generate the dataset file but not the config file in the specified folder.

@yumouwei you should enable debug statements to see where the config json file is stored:

disruption-py/disruption_py/workflow.py

Line 132 in a3bb455

logger.debug("Dumped configuration into: {path}", path=json_file_path)

EDIT: I've upgraded it to verbose, now.

Additionally, [...] set the name of the config file [...] so that a user can distinguish them in case there are multiple dataset files stored in the same folder.

@yumouwei what if the nc file does not exist, but the config file does? would you rename it at will, overwrite, or crash with an error?
let's tackle this in a separate PR if we deem it high enough in our priority list.

I found that the json and log file are always saved in /tmp folder. Is that the expected behavior?

@zapatace for now, yes, as I doubt people want to specify a folder for each of their runs.
you already can specify your specific log file to be created wherever you want, but I doubt people do that.
let's tackle this in a separate PR if we deem it high enough in our priority list.

if you rerun last code, you have the expected error [...] FileExistsError(f"File already exists! {self.path}")

@zapatace yes, I will not overwrite any data.
I'd suggest people to leave the defaults and then move/copy/rename/hardlink their desired nc file.

However, this type of error is not registered in the log file.

@zapatace correct, I believe only workflow-specific errors get logged, but stuff like this isn't.
let's tackle this in a separate PR if we deem it high enough in our priority list.

Finally, I have a suggestion. I tried different time settings, shot_lists, and output settings… It would be nice that any of these options in RetrievalSettings() or get_shots_data() are saved in the output.json, so even if the code is lost by the user, another user in the future can reproduce the exact data retrieval.

@zapatace yes, that's the plan:

save all the configs (ie: this PR),
move all settings to configs (cf: Evaluate proper design for configurations and settings #445),
rejoice from absolute reproducibility.

arguably the most important settings are currently already configurations, that is:

efit nickname setting for C-MOD,
code rundb tag for DIII-D,

final thoughts?

gtrevisan added 6 commits February 12, 2026 15:38

dump config from get_shots_data

3336b27

add filter_dict to utils.misc

d38abb7

do not dump password into config json

5d5af44

drop attributes and tests sections before json dump

33b8805

do not hide config dump

90bf62a

simplify typing by dropping Any

9bcd234

gtrevisan requested review from Copilot and yumouwei February 12, 2026 21:14

Copilot started reviewing on behalf of gtrevisan February 12, 2026 21:14 View session

gtrevisan mentioned this pull request Feb 12, 2026

Evaluate data ontology metadata info #377

Closed

Copilot AI reviewed Feb 12, 2026

View reviewed changes

change wording of filter_dict

a5b5ebd

gtrevisan requested a review from zapatace February 12, 2026 21:52

yumouwei requested changes Feb 14, 2026

View reviewed changes

gtrevisan added 2 commits February 17, 2026 10:03

Merge remote-tracking branch 'origin/dev' into glt/config

a3bb455

upgrade config dump to verbose

bca70c6

gtrevisan requested a review from yumouwei February 17, 2026 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump configuration next to the output#522

Dump configuration next to the output#522
gtrevisan wants to merge 9 commits intodevfrom
glt/config

gtrevisan commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yumouwei left a comment

Uh oh!

zapatace commented Feb 16, 2026 •

edited

Loading

Uh oh!

gtrevisan commented Feb 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gtrevisan commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yumouwei left a comment

Choose a reason for hiding this comment

Uh oh!

zapatace commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gtrevisan commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zapatace commented Feb 16, 2026 •

edited

Loading

gtrevisan commented Feb 17, 2026 •

edited

Loading