-
Notifications
You must be signed in to change notification settings - Fork 42
Fix potential data loss saving plugin properties #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential data loss saving plugin properties #195
Conversation
This addresses an upsert race condition that occurred when saving plugin properties (e.g., Data Pruner settings, third-party plugins) in environments with a read/write split database configuration where the read-only connection points to a replica. The Problem: The prior code attempted to determine whether to INSERT or UPDATE by first checking for the property's existence using the read-only database connection. Since updating all properties for a plugin involves deleting them all first, if this DELETE operation had not yet propagated to the replica, the read-only check would incorrectly indicate the property still existed. The Result: An UPDATE statement would be attempted, which would fail to match any rows (since the data had already been deleted from the primary) and silently return zero rows updated. This failure was not being checked, leading to data loss for the affected property. The Solution: This change eliminates the preliminary read check. It now attempts an UPDATE first. If the update affects zero rows, a guaranteed INSERT is performed. This pattern ensures atomicity and correctness regardless of replication latency. See https://sqlperformance.com/2020/09/locking/upsert-anti-pattern Issue: Innovar-Healthcare/BridgeLink#66 Signed-off-by: Tony Germano <tony@germano.name>
mgaffigan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pacmano1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the "fix" to be never sending requests to the read only replica for things related to configuration. Is that programatically a much heavier lift? e.g. "if the ultimate intention of an action is to update something, never use the read only replica".
|
Code LGTM A reference - The I reviewed the mappings to ensure there wasn't some funky SQL that would interfere with this solution on other DBs. Looks OK. This was a review and not actively tested across the supported DB engines though. I have some questions:
Consider the function definition. Would implementing something like |
This change removes the query against the read-only replica by removing the query for existence entirely. Instead the UPDATE replaces the existence query by either updating the row (at which point it's finished) or returning 0 rows affected (indicating an insert is needed.) |
|
|
In re: testing, this sort of TOCTOU/data propagation bug is notoriously hard to test for. I would suggest the best practice is to rely on atomic statements without trying to make data consistency guarantees. (Put differently: make it the DBMS's problem, but do so in the simplest possible way to avoid implementation issues) For an example of how to prove data consistency correctness, see TLA+, but I don't imagine it is practical to do so. |
I agree with this thought, but I don't think it should block this PR. This PR still moves us to a better place, and does not apparently introduce any new undesirable behavior. I'm not sure what the "goal" is with the read replicas - are they meant for HA or for avoiding reads for dashboard queries? Most real world traffic I've seen from mirth is 90% writes (excluding dashboard searches). |
pacmano1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will approve but wrapping stuff in transactions might be a longer term objective.
kayyagari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A transaction boundary would prevent any other race conditions that might arise when the the same database is used by multiple OIE instances.
Performing all these operations on an SqlSession (SqlConfig.getInstance().getSqlSessionManager().openSession(TransactionIsolationLevel.READ_COMMITTED)) would be a correct approach.
I agree that the current fix is good enough to roll out immediately and the said enhancement can be made later.
This addresses an upsert race condition that occurred when saving plugin properties (e.g., Data Pruner settings, third-party plugins) in environments with a read/write split database configuration where the read-only connection points to a replica.
The Problem:
The prior code attempted to determine whether to INSERT or UPDATE by first checking for the property's existence using the read-only database connection. Since updating all properties for a plugin involves deleting them all first, if this DELETE operation had not yet propagated to the replica, the read-only check would incorrectly indicate the property still existed.
The Result:
An UPDATE statement would be attempted, which would fail to match any rows (since the data had already been deleted from the primary) and silently return zero rows updated. This failure was not being checked, leading to data loss for the affected property.
The Solution:
This change eliminates the preliminary read check. It now attempts an UPDATE first. If the update affects zero rows, a guaranteed INSERT is performed. This pattern ensures atomicity and correctness regardless of replication latency.
See https://sqlperformance.com/2020/09/locking/upsert-anti-pattern
Issue: Innovar-Healthcare/BridgeLink#66