Skip to content

Feature/182 add statisticswellbeingjava to calculate statistics#183

Merged
andrewbaxter439 merged 19 commits intodevelopfrom
feature/182-add-statisticswellbeingjava-to-calculate-statistics
Jun 2, 2025
Merged

Feature/182 add statisticswellbeingjava to calculate statistics#183
andrewbaxter439 merged 19 commits intodevelopfrom
feature/182-add-statisticswellbeingjava-to-calculate-statistics

Conversation

@andrewbaxter439
Copy link
Copy Markdown
Collaborator

What

Why

  • gives summary statistics for health and wellbeing

Questions outstanding

  • Should this try to do subgroups by age/sex? How is this best done - by adding loops to the SimPathsCollector updates (to make multiple rows) or interacting subgrouping variables by indicator (to make lots of columns)?

@andrewbaxter439 andrewbaxter439 requested a review from Copilot May 30, 2025 10:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds aggregate wellbeing statistics functionality to the simulation, closing issue #182. Key changes include:

  • Introducing a new HealthStatistics class to calculate and store health-related statistics.
  • Modifying SimPathsCollector to integrate health statistics with event scheduling and persistence.
  • Updating processes to include DumpHealthStatistics alongside existing employment statistics.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/main/java/simpaths/experiment/SimPathsCollector.java Added fields and event scheduling to update and export health statistics.
src/main/java/simpaths/data/statistics/HealthStatistics.java New class to compute statistical aggregates for various health and wellbeing indicators.

@andrewbaxter439 andrewbaxter439 requested a review from Copilot May 30, 2025 11:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds functionality to compute and export aggregate wellbeing statistics alongside existing employment metrics. Key changes include:

  • Integration of health statistics calculations and exports in SimPathsCollector.
  • Introduction of the HealthStatistics class that computes various aggregate metrics.
  • Updates to the simulation event scheduling to include health statistics dumping.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/main/java/simpaths/experiment/SimPathsCollector.java Added parameters, HealthStatistics instance, and event scheduling for health statistics
src/main/java/simpaths/data/statistics/HealthStatistics.java Implemented calculations for multiple wellbeing metrics using cross-sectional functions

@andrewbaxter439
Copy link
Copy Markdown
Collaborator Author

I've added @justin-ven and @pbronka as reviewers - do sense-check: this should just be adding an additional optional statistics output file.

As above though, is there a good practice for how best to output subgroup statistics? Looping over nested age*sex filtered statistics to populate multiple rows? Or doing it by column?

@andrewbaxter439 andrewbaxter439 added the enhancement New feature or request label May 30, 2025
@justin-ven
Copy link
Copy Markdown
Contributor

justin-ven commented Jun 1, 2025

I don't have a clear preference over the looping vs by column - I am guessing that there will be differences in execution time that would decide the matter (at least for me). Unfortunately, I have not run associated tests myself...

Also, have you seen the copilot suggestions? They look sensible to me and I would normally implement these sorts of things before finalising on the commit.

Copy link
Copy Markdown
Contributor

@justin-ven justin-ven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard for me to say if this is all ok without actually running it myself - I struggle to know how to review these things effectively. The copilot suggestions look sensible, though again it is hard for me know without associated testing.

@andrewbaxter439
Copy link
Copy Markdown
Collaborator Author

Thanks Justin for your comments! Yeah that's great feedback - ideally wanted to check with you whether this fitted with the 'logic' of statistics calculations in SimPaths. Perhaps we make a practice of setting specific queries for requested reviewers on future PRs? Can add that to agenda for tomorrow.

I don't have a clear preference over the looping vs by column - I am guessing that there will be differences in execution time that would decide the matter (at least for me). Unfortunately, I have not run associated tests myself...

Will perhaps experiment with loop vs extra variables then. I would lean towards looping as doing separate columns for male/female/total would triple the width of the file and triple the lines of code in HealthStatistics.java in a repetitive (and harder to maintain) way? Can check if it affects time.

Also, have you seen the copilot suggestions? They look sensible to me and I would normally implement these sorts of things before finalising on the commit.

Yeah they seem like good suggestions actually. I could try to do:

  • move WELLBEING_MEASURE_ADJUSTMENT to class level (this is possibly temporary till I fix new Amend life satisfaction scores to be in range 0-10 to align with ONS use of variable #184)
  • refactor and create theoretical new calculateAndSetPercentiles and calculateMean methods, although perhaps the logic for these belongs in JAS-mine-core, creating an abstract class of StatisticsCalculation with the methods that can be inherited across all statistics classes? This might be best factored out as a new issue/idea?

Copy link
Copy Markdown
Contributor

@pbronka pbronka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all looks good to me; more generally the copilot's suggestion that the way in which we calculate statistics could likely be simplified is a good one...

@andrewbaxter439
Copy link
Copy Markdown
Collaborator Author

Have updated this with:

  • gender looping to create three rows in each dataset

Potentially for further work in separate branches:

  • the factoring out of the creating of quantiles etc. seems to make good sense, have tried out some code and will put it in a new branch. This will allow the new calculations to be fully tested against the old ones (to make sure stats outputted are identical)
  • splitting by age: this seems possible in several ways, by adding rows or columns. Rather than do this to the gender-split file though (making 3*n_age_groups rows per year), perhaps a second HealthStatistics csv file with age splits would be better? will work on that as an alternative.

Pending passing tests, will merge these stable updates.

@andrewbaxter439 andrewbaxter439 merged commit 10abc69 into develop Jun 2, 2025
4 checks passed
@andrewbaxter439 andrewbaxter439 linked an issue Jun 3, 2025 that may be closed by this pull request
@andrewbaxter439 andrewbaxter439 deleted the feature/182-add-statisticswellbeingjava-to-calculate-statistics branch June 5, 2025 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add StatisticsWellbeing.java to calculate statistics

4 participants