Conversation
Code changes to accommodate Daria's changes to the data. To make the model run, the Excel files also have to be changed (country prefix + typo in reg_fertility).
|
this is merging 323 commits into main, including the whole |
|
meant to go to the develop branch jointly with Daria's PR, thanks @andrewbaxter439! |
| Ethn_White, | ||
| // Ethn_Mixed, | ||
| Ethn_Asian, | ||
| Ethn_Black, | ||
| Ethn_Other, | ||
| // Ethn_Missing, |
There was a problem hiding this comment.
Brief check on this - this seems to be a) relabelling the enums from Ethnicity{Category} to Ethn_{Category}. Is this needed? b) excluding two of the categories from enum lookup - this would break any regressions where the other two are needed? Perhaps I'm understanding it incorrectly, we're just imminently trying to re-estimate the mental health modules and need to quickly adapt to this change if this is what it will be stably looking like.
Cheers!
There was a problem hiding this comment.
They are both fair remarks, thanks Andy.
The fact is that, in the regression Excel files, the latest estimations provided by @dariaple use four binary variables for ethnicity labelled Ethn_{Category}. Therefore, we modified the code accordingly.
I hope this clarifies!
There was a problem hiding this comment.
Ah I see where it's come from! The other regression files, including reg_financial_distress.xlsx in develop (and mental health in other branches) are calling on the enums EthnicityMixed, EthnicityAsian, EthnicityBlack and EthnicityOther. They'll have to be aligned across excel files to one set or another I think to make the model work?
There was a problem hiding this comment.
Thanks @andrewbaxter439. I think that would be the cleanest solution. Technically an alternative would be to keep these variables with different names in reg_financial_distress.xlsx file and add them as enum constants with a different aggregation from the underlying 6-category ethnicity variable than the one used above for "Ethn_..." - but I think it's a bit messy?
There was a problem hiding this comment.
Yeah, I think it would be preferable to choose one convention (Ethn_* or Ethnicity*) and use it consistently. I don't see much of an issue with some of these representing different ways of aggregating the underlying 6-category variable, as long as they're consistently and unambiguously named? So there could be an EthnicityOther that represents EthnicityMissingOrOther that represents
On that basis, what's currently called Ethn_Other in this PR should perhaps be called something like Ethn_MissingOrMixedOrOther (Ethn_MissingMixedOther for short?). I appreciate this is more unwieldy but I think it's really important for these names to be as unambiguous as possible to avoid mistakes.
It might be easiest to keep the existing names (Ethnicity*), and add whatever additional aggregations are needed for the new estimates?
actually you might need to do a merge between the two branches to get it to pass the tests first if they're depending on each other! |
These updates work with Daria's new estimates for the UK.
|
Final adjustments to accommodate the latest UK estimates have been completed. The model compiles and runs correctly. This branch is ready to be merged. |
I reverted the change in this commit: 913f783#diff-5519d2b0765548d40a1806f7a0acf6db8f8205065f51e0d696eea2edd111c26c, which was preventing the persister to work.
Adjustments to the code to run the model with the new estimates/data. Changes in the Excel files are also required (See: #214 (comment))