Tools for backward incompatible DB upgrades#1833
Open
gpetretto wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As mentioned in #48 we would like to introduce a backward incompatible change to split the blocks data from the items. We are thus proposing a general procedure for handling backward incompatible upgrades to the DB (see #1184).
The whole implementation is heavily based on the one we have developed for jobflow-remote and should be used only when truly needed, as we recongnize the pain and the potential issues associated with DB migrations.
Overview
The main change is the addition of the
upgrademodule that contains aDatabaseUpgrader. A new migration requires writing a function that performs the required updates on the DB and decorating it with@DatabaseUpgrader.register_upgrade("X.Y.Z"), where"X.Y.Z"is the new version that will be released.In order to determine the datalab version associated with the DB a new collection (
database_metadata) is created, where a document stores the current version number of datalab.When the code is updated to a newer version, the upgrade procedure checks the datalab version stored in the DB and compares it with the version of the code being executed, applying sequentially all the intermediate upgrades to the DB, and finally updating the document in
database_metadatawith the current datalab version. This allows to build incremental upgrades between different versions.This upgrade can be executed with an invoke task:
migration.upgrade.Initialization
A downside of the current proposal is that storing the datalab version in the DB requires an initialization. This is done through the invoke task
admin.initialise-schema-versionthat needs to be executed during the initial deployment. We have considered doing this automatically in thecreate_app, but the main issue is to tell apart a DB associated to an already existing deplyoment from a pristine one, since in both cases thedatabase_metadatawill be empty. In principle one could imagine defining a procedure to guess the status of the DB (e.g., no items present or all the collections should be empty), but it would be easy to imagine cases where these checks might lead to the wrong conclusion. For example, if checking that all the collections are empty, this may fail to recognize a new deployment if in the future someone adds some other metadata document or if a user is added to the DB before starting the server. So, we ended up proposing this manual initialization procedure. It would be easy to switch to an automated one if preferred or if a reliable criteria can be determined.Example
As an example of how to create a new upgrade, here is a draft to modify
relationshipsto be based onrefcodeinstead ofitem_id(see #1184). I don't think this covers properly all the cases and I did not attempt to make a complete one here, as it is beyond the scope of this PR. Just an idea of how this would work. Some more example can be found in jobflow remote, whose implementation is very similar.relationships upgrade example
TODO
Document the upgrade and initialize procedure if approved.
Let us know what you think or if you need more details.