Skip to content

New data loading#307

Open
cormacmlynch wants to merge 24 commits into
mainfrom
new-data-loading-2
Open

New data loading#307
cormacmlynch wants to merge 24 commits into
mainfrom
new-data-loading-2

Conversation

@cormacmlynch

Copy link
Copy Markdown
Collaborator

This pull request restructures how data is stored and read in to the model. The main changes are in input_functions.py.

Excel-based master files and per region per variable csv files have been replaced by a single csv file for each input variable. The benefits of this method are enhanced speed of data loading (no need for the xlsx to csv step and significantly fewer files to read) and better machine-friendly data format (for version control). The Polars library is also used in place of Pandas for speed gains.

The new data loading is only implemented for the power, heat, transport, and freight models. Data files for just S0 are included for now. Scenarios work as they did before -- a full set of variable csvs is not required for a scenario, only files different from S0 (e.g., MEWR in a scenario with regulation on coal).

First pass at implementing a new way to store and read in data files.

This method stores variables as CSV files (one for each variable). Users are expected to edit CSV files directly. CSVs are then read into the model, removing the xlsx->csv step. Polars is also used in place of Pandas for speed gains (not clear yet how much the gain in speed is due to this swap).

Also included is a script (convert_inputs_to_new_format.py) which can be used to convert old style files to the new format.

Model output has been tested for Tr, P, Fr, H. Shares results the same as before.
no longer needed as part of new data loading
Removing some old text left over from prototyping
@Femkemilene

Copy link
Copy Markdown
Contributor

@copilot: can you review all the .py files? No need to look at the file changes in the Inputs folder

@Femkemilene Femkemilene marked this pull request as ready for review May 14, 2026 09:12
Agent-Logs-Url: https://github.com/cpmodel/FTT_StandAlone/sessions/c7808ca3-18df-4bbb-b204-6c66dc5ff412

Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>

Copilot AI commented May 14, 2026

Copy link
Copy Markdown
Contributor

@copilot: can you review all the .py files? No need to look at the file changes in the Inputs folder

Reviewed all changed .py files outside Inputs/ and applied two fixes in commit a0b0003: (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py, and (2) restored explicit FileNotFoundError in SourceCode/support/dimensions_functions.py when VariableListing.csv is missing. No UI changes were made (so no screenshot applies).

Copilot AI requested a review from Femkemilene May 14, 2026 09:18
@Femkemilene

Copy link
Copy Markdown
Contributor

@copilot: can you explain what you did here and why? (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py

Copilot AI commented May 14, 2026

Copy link
Copy Markdown
Contributor

@copilot: can you explain what you did here and why? (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py

In a0b0003, I removed the runtime sys.path edits because they make imports depend on execution context and can accidentally shadow modules. I replaced from titles_functions import load_titles with from SourceCode.support.titles_functions import load_titles so import resolution is explicit, stable, and consistent with package-style imports used elsewhere in the repo. This reduces environment-specific import bugs without changing conversion logic.

@Femkemilene Femkemilene linked an issue May 18, 2026 that may be closed by this pull request
@Femkemilene

Copy link
Copy Markdown
Contributor

@copilot: I've recently merged a PR into main. Can you merge main into this branch, so it's easier to assess that everything works together before I start reviewing the PR?

# Conflicts:
#	SourceCode/model_class.py
#	SourceCode/support/input_functions.py

Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>

Copilot AI commented May 26, 2026

Copy link
Copy Markdown
Contributor

@copilot: I've recently merged a PR into main. Can you merge main into this branch, so it's easier to assess that everything works together before I start reviewing the PR?

Done — I merged main into this branch and pushed the merge commit 82ff0c0. There were conflicts in SourceCode/model_class.py and SourceCode/support/input_functions.py, and I resolved them by keeping this branch’s versions in those overlapping sections.

@Femkemilene

Femkemilene commented May 26, 2026

Copy link
Copy Markdown
Contributor

@cormacmlynch: In the previous version, we had a lot of error messages when things went wrong. Now, the data_loading was more finicky before, so maybe it's not as urgent, but I was wondering if it was a conscious choice to get rid of this functionality. Otherwise, it's lovely. I've tidied up slightly and hopefully increased the stability of the GUI (it gave a weird warning that Claude fixed).

I think the type of data errors we need to check for:

  1. Missing data (?)
  2. Incorrection dimensions (too many or too few rows and columns).

I think that should be fairly clean to check for. My first attempt with Claude led to low-quality vibe code, so have not committed this. Would you have time to reintroduce these two checks?

@Femkemilene Femkemilene linked an issue Jun 16, 2026 that may be closed by this pull request
Reflects the removal of the masterfiles folder. Instead looks at S0 folder. Maybe we need to find a more robust way to do this?
These had stopped working with the new data loading changes.
Previous commit reverted because linking the progress bar in the GUI seems to significantly slow down the runtime.

Adding a brief message for when inputs are being loaded.
Function now checks classification_titles for the list of appropriate FTTs. This function is used both in data loading and when the user is picking models from the dropdown box in the GUI.
@Femkemilene

Copy link
Copy Markdown
Contributor

The howto file on how to add new variables needs updating (docs / source / howto). Can the bullet point on the Excel file additions simply be removed or is there something new people need to take into account when adding new variables?

@cormacmlynch cormacmlynch marked this pull request as draft June 17, 2026 15:05
All simulation years.
tl_idx : dict
Pre-built ``{year: timeline_index}`` lookup.
forstart : dict

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name (forstart or forecast start in VariableListing.csv) is something I've got confused at in my recent tipping_point branch changes. For variables like MEWG, TEWK, the year stated is 2010 and 2017, which is before the forecast start. Should this have a different name? Can't think of a good one, but if you set this value after the historical years, the historical values are being ignored.

Not used and dims are incorrect
Two new functions. One checks that the csvs read in are the dimensions they should be according to lengths of their titles. The other references a new 'Is input variable' column in VariableListing.csv to check that these core variables are present in the S0 folder.
User is warned if they select a model and scenario combination that has no input files. Additionally, this scenario is not run.
@cormacmlynch cormacmlynch marked this pull request as ready for review June 18, 2026 12:34
For new data loading and new gui
and change PMF variablelisting domain to FTT-P instead
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Delete S3 scenario Speed up import of input files

3 participants