New data loading#307
Conversation
First pass at implementing a new way to store and read in data files. This method stores variables as CSV files (one for each variable). Users are expected to edit CSV files directly. CSVs are then read into the model, removing the xlsx->csv step. Polars is also used in place of Pandas for speed gains (not clear yet how much the gain in speed is due to this swap). Also included is a script (convert_inputs_to_new_format.py) which can be used to convert old style files to the new format. Model output has been tested for Tr, P, Fr, H. Shares results the same as before.
no longer needed as part of new data loading
Removing some old text left over from prototyping
|
@copilot: can you review all the .py files? No need to look at the file changes in the Inputs folder |
Agent-Logs-Url: https://github.com/cpmodel/FTT_StandAlone/sessions/c7808ca3-18df-4bbb-b204-6c66dc5ff412 Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>
Reviewed all changed |
|
@copilot: can you explain what you did here and why? (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py |
In |
|
@copilot: I've recently merged a PR into main. Can you merge main into this branch, so it's easier to assess that everything works together before I start reviewing the PR? |
# Conflicts: # SourceCode/model_class.py # SourceCode/support/input_functions.py Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>
Done — I merged |
|
@cormacmlynch: In the previous version, we had a lot of error messages when things went wrong. Now, the data_loading was more finicky before, so maybe it's not as urgent, but I was wondering if it was a conscious choice to get rid of this functionality. Otherwise, it's lovely. I've tidied up slightly and hopefully increased the stability of the GUI (it gave a weird warning that Claude fixed). I think the type of data errors we need to check for:
I think that should be fairly clean to check for. My first attempt with Claude led to low-quality vibe code, so have not committed this. Would you have time to reintroduce these two checks? |
Reflects the removal of the masterfiles folder. Instead looks at S0 folder. Maybe we need to find a more robust way to do this?
These had stopped working with the new data loading changes.
This reverts commit 5421ab2.
Previous commit reverted because linking the progress bar in the GUI seems to significantly slow down the runtime. Adding a brief message for when inputs are being loaded.
Function now checks classification_titles for the list of appropriate FTTs. This function is used both in data loading and when the user is picking models from the dropdown box in the GUI.
|
The howto file on how to add new variables needs updating (docs / source / howto). Can the bullet point on the Excel file additions simply be removed or is there something new people need to take into account when adding new variables? |
| All simulation years. | ||
| tl_idx : dict | ||
| Pre-built ``{year: timeline_index}`` lookup. | ||
| forstart : dict |
There was a problem hiding this comment.
This variable name (forstart or forecast start in VariableListing.csv) is something I've got confused at in my recent tipping_point branch changes. For variables like MEWG, TEWK, the year stated is 2010 and 2017, which is before the forecast start. Should this have a different name? Can't think of a good one, but if you set this value after the historical years, the historical values are being ignored.
Not used and dims are incorrect
Two new functions. One checks that the csvs read in are the dimensions they should be according to lengths of their titles. The other references a new 'Is input variable' column in VariableListing.csv to check that these core variables are present in the S0 folder.
User is warned if they select a model and scenario combination that has no input files. Additionally, this scenario is not run.
For new data loading and new gui
and change PMF variablelisting domain to FTT-P instead
This pull request restructures how data is stored and read in to the model. The main changes are in input_functions.py.
Excel-based master files and per region per variable csv files have been replaced by a single csv file for each input variable. The benefits of this method are enhanced speed of data loading (no need for the xlsx to csv step and significantly fewer files to read) and better machine-friendly data format (for version control). The Polars library is also used in place of Pandas for speed gains.
The new data loading is only implemented for the power, heat, transport, and freight models. Data files for just S0 are included for now. Scenarios work as they did before -- a full set of variable csvs is not required for a scenario, only files different from S0 (e.g., MEWR in a scenario with regulation on coal).