Skip to content

Update code/new data#226

Closed
dav-sonn wants to merge 3 commits intosimpaths:developfrom
dav-sonn:update-code/new-data
Closed

Update code/new data#226
dav-sonn wants to merge 3 commits intosimpaths:developfrom
dav-sonn:update-code/new-data

Conversation

@dav-sonn
Copy link
Copy Markdown
Collaborator

@dav-sonn dav-sonn commented Aug 6, 2025

Adjustments to the code to run the model with the new estimates/data. Changes in the Excel files are also required (See: #214 (comment))

Code changes to accommodate Daria's changes to the data. To make the model run, the Excel files also have to be changed (country prefix + typo in reg_fertility).
@dav-sonn dav-sonn requested a review from pbronka August 6, 2025 18:06
@andrewbaxter439
Copy link
Copy Markdown
Collaborator

this is merging 323 commits into main, including the whole develop branch. Is that what's intended, or is this best merging back to develop?

@pbronka pbronka changed the base branch from main to develop August 7, 2025 10:22
@pbronka
Copy link
Copy Markdown
Contributor

pbronka commented Aug 7, 2025

meant to go to the develop branch jointly with Daria's PR, thanks @andrewbaxter439!

Comment on lines +2381 to +2386
Ethn_White,
// Ethn_Mixed,
Ethn_Asian,
Ethn_Black,
Ethn_Other,
// Ethn_Missing,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief check on this - this seems to be a) relabelling the enums from Ethnicity{Category} to Ethn_{Category}. Is this needed? b) excluding two of the categories from enum lookup - this would break any regressions where the other two are needed? Perhaps I'm understanding it incorrectly, we're just imminently trying to re-estimate the mental health modules and need to quickly adapt to this change if this is what it will be stably looking like.

Cheers!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are both fair remarks, thanks Andy.
The fact is that, in the regression Excel files, the latest estimations provided by @dariaple use four binary variables for ethnicity labelled Ethn_{Category}. Therefore, we modified the code accordingly.
I hope this clarifies!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see where it's come from! The other regression files, including reg_financial_distress.xlsx in develop (and mental health in other branches) are calling on the enums EthnicityMixed, EthnicityAsian, EthnicityBlack and EthnicityOther. They'll have to be aligned across excel files to one set or another I think to make the model work?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrewbaxter439. I think that would be the cleanest solution. Technically an alternative would be to keep these variables with different names in reg_financial_distress.xlsx file and add them as enum constants with a different aggregation from the underlying 6-category ethnicity variable than the one used above for "Ethn_..." - but I think it's a bit messy?

Copy link
Copy Markdown
Contributor

@igelstorm igelstorm Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would be preferable to choose one convention (Ethn_* or Ethnicity*) and use it consistently. I don't see much of an issue with some of these representing different ways of aggregating the underlying 6-category variable, as long as they're consistently and unambiguously named? So there could be an EthnicityOther that represents $dot01=Other$ and an EthnicityMissingOrOther that represents $dot01 \in (Missing, Other)$, etc.

On that basis, what's currently called Ethn_Other in this PR should perhaps be called something like Ethn_MissingOrMixedOrOther (Ethn_MissingMixedOther for short?). I appreciate this is more unwieldy but I think it's really important for these names to be as unambiguous as possible to avoid mistakes.

It might be easiest to keep the existing names (Ethnicity*), and add whatever additional aggregations are needed for the new estimates?

@andrewbaxter439
Copy link
Copy Markdown
Collaborator

meant to go to the develop branch jointly with Daria's PR

actually you might need to do a merge between the two branches to get it to pass the tests first if they're depending on each other!

These updates work with Daria's new estimates for the UK.
@dav-sonn
Copy link
Copy Markdown
Collaborator Author

Final adjustments to accommodate the latest UK estimates have been completed. The model compiles and runs correctly. This branch is ready to be merged.

I reverted the change in this commit: 913f783#diff-5519d2b0765548d40a1806f7a0acf6db8f8205065f51e0d696eea2edd111c26c, which was preventing the persister to work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants