Skip to content

Speeding up database handling in initial populations#249

Merged
andrewbaxter439 merged 23 commits intodevelopfrom
bugfix/tidy-database-creation
Sep 25, 2025
Merged

Speeding up database handling in initial populations#249
andrewbaxter439 merged 23 commits intodevelopfrom
bugfix/tidy-database-creation

Conversation

@andrewbaxter439
Copy link
Copy Markdown
Collaborator

@andrewbaxter439 andrewbaxter439 commented Sep 22, 2025

Note, includes changes in David's branch in #226

What

  • This extra fix speeds up the loading of initial populations, which slows down for early years and for more (numeric) variable additions
  • Converts all numeric input variables to double/integer on loading of initial populations to PERSON_UK_[year] tables
  • Converts idhousehold and buid to BIGINT to match other tables
  • Change the three-table joins to one call when loading starting population (SELECT households FROM Household households - very quick!), then change corresponding calls to getBenefitUnits and getMembers of the household to 'LAZY' calls which only fetch the appropriate rows when needed.

Why

  • as noted in Reducing the initialisation time #246 the runtime for UK models slows down a lot for 2015 and A LOT for 2011. Setup of runs takes ~200 minutes
  • This tweak shortens the loadInitialPopulations time to ~2 mins
  • If this is a suitable fix, it's likely needed ahead of adding any future variables

Please review and see if this is a sensible amendment @dav-sonn @justin-ven @igelstorm @Mariia-Var

dav-sonn and others added 12 commits August 6, 2025 17:28
Code changes to accommodate Daria's changes to the data. To make the model run, the Excel files also have to be changed (country prefix + typo in reg_fertility).
These updates work with Daria's new estimates for the UK.
I reverted the change in this commit: 913f783#diff-5519d2b0765548d40a1806f7a0acf6db8f8205065f51e0d696eea2edd111c26c, which was preventing the persister to work.
Signed-off-by: Andy Baxter <andrewbaxter439@gmail.com>
Signed-off-by: Andy Baxter <andrewbaxter439@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes database handling for initial population loading to reduce simulation startup time, particularly for early years and models with more numeric variables. The changes improve performance by converting numeric variables to appropriate database types and optimizing database fetch strategies.

Key Changes

  • Converts numeric input variables (mental/physical health scores, satisfaction indices) to proper database types (DOUBLE/INT) and changes ID columns to BIGINT for consistency
  • Optimizes database queries by removing eager JOINs from initial population loading and switching to lazy loading with SUBSELECT fetch mode
  • Updates regression types from OrderedRegression to GeneralisedOrderedRegression for health and education models

Reviewed Changes

Copilot reviewed 13 out of 31 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/main/resources/META-INF/persistence.xml Adds new entity mappings and ORM configuration file
src/main/java/simpaths/model/enums/Ethnicity.java Adds Missing ethnicity category
src/main/java/simpaths/model/SimPathsModel.java Updates time trend parameters and simplifies initial population query
src/main/java/simpaths/model/Person.java Adds partner health variables, new lag variables, and extensive enum additions
src/main/java/simpaths/model/Household.java Changes fetch strategy from EAGER to LAZY with SUBSELECT
src/main/java/simpaths/model/BenefitUnit.java Changes fetch strategy from EAGER to LAZY with SUBSELECT
src/main/java/simpaths/experiment/SimPathsMultiRun.java Fixes missing country assignment in initialization logic
src/main/java/simpaths/experiment/SimPathsCollector.java Adds null check for median calculation
src/main/java/simpaths/data/startingpop/DataParser.java Updates database schema with proper data types and column conversions
src/main/java/simpaths/data/RegressionName.java Changes regression types to GeneralisedOrderedLogit
src/main/java/simpaths/data/Parameters.java Updates regression objects and column counts for new model parameters
src/main/java/simpaths/data/ManagerRegressions.java Moves regression handling from OrderedRegression to GeneralisedOrderedRegression
config/default.yml Reduces default population size for testing
Files not reviewed (1)
  • .idea/sqldialects.xml: Language not supported

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Copy Markdown
Collaborator Author

@andrewbaxter439 andrewbaxter439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pull request was ablet o be tidied and resolved actually that would supersede #226 whilst solving problems?

dav-sonn and others added 3 commits September 24, 2025 12:13
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@andrewbaxter439 andrewbaxter439 merged commit f64663c into develop Sep 25, 2025
5 checks passed
@andrewbaxter439 andrewbaxter439 deleted the bugfix/tidy-database-creation branch January 6, 2026 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants