Skip to content

More reliably abort commands if client disconnects.#2635

Merged
danieldoglas merged 9 commits into
mainfrom
tyler-more-disconnect-abort-checks
Jun 12, 2026
Merged

More reliably abort commands if client disconnects.#2635
danieldoglas merged 9 commits into
mainfrom
tyler-more-disconnect-abort-checks

Conversation

@tylerkaraszewski

@tylerkaraszewski tylerkaraszewski commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Details

This extends our Abort for commands where the client has disconnected from just inside sql queries to before and after each command phase, and during HTTPS requests.

Fixed Issues

Fixes https://expensify.slack.com/archives/C0B96MS9B8X/p1781194408074179

Tests


Internal Testing Reminder: when changing bedrock, please compile auth against your new changes

@tylerkaraszewski tylerkaraszewski self-assigned this Jun 11, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12d3496f43

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread BedrockCore.cpp Outdated
@tylerkaraszewski

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce53637b5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread BedrockCore.cpp Outdated
@tylerkaraszewski

Copy link
Copy Markdown
Contributor Author

@codex review

@tylerkaraszewski tylerkaraszewski changed the title [WIP] Do a better job of aborting commands on disconnect More reliably abort commands if client disconnects. Jun 11, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

danieldoglas
danieldoglas previously approved these changes Jun 12, 2026

@danieldoglas danieldoglas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Should we try to set this after acquiring the lock in SQLite::prepare? That way, even if we acquired the lock after waiting 60s, we can abort without commiting if the client disconnected.

@danieldoglas

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

Reviewed commit: f327c2b997

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@tylerkaraszewski

Copy link
Copy Markdown
Contributor Author

Should we try to set this after acquiring the lock in SQLite::prepare? That way, even if we acquired the lock after waiting 60s, we can abort without committing if the client disconnected.

So I think this is a good idea, and thought about adding it, but I think I'm OK omitting or at least deferring it. It requires passing the command or a pointer to the shouldAbort flag to be passed all the way down into SQLite::prepare, which isn't that bad, but is a bit messy, and I don't really love it from an encapsulation standpoint. But also, once we've waited a long time for the commit lock, it seems like a waste not to do anything with it, we're basically done at that point.

But I think what I'd like more is to adjust this to be interruptible so that instead of potentially waiting 20 seconds for the lock and then doing nothing with it, we could at least interrupt waiting for the lock when the client disconnects. This is also a little bigger change.

What if we do this as a followup?

@tylerkaraszewski

Copy link
Copy Markdown
Contributor Author

@danieldoglas - updated, comments addressed.

@tylerkaraszewski

Copy link
Copy Markdown
Contributor Author

@codex review

@danieldoglas

Copy link
Copy Markdown
Contributor

But I think what I'd like more is to adjust this to be interruptible so that instead of potentially waiting 20 seconds for the lock and then doing nothing with it, we could at least interrupt waiting for the lock when the client disconnects. This is also a little bigger change.

I agree with this. Let's address in a followup then!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 387844dc5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread BedrockCore.cpp
command->_inDBWriteOperation = true;
command->process(_db);
command->_inDBWriteOperation = false;
_throwIfAborted(command);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check aborts again before handing writes to commit

When a client disconnects after this new post-process() check passes, but before the worker/sync commit path actually commits the transaction, shouldAbort can still be set by handleSocket while the command is waiting for the commit lock; I checked the worker and sync commit sections in BedrockServer.cpp, and they only call getRemainingTime()/commit() after processCommand() returns NEEDS_COMMIT, with no abort check or abort ref. In that timing window the write is still committed and only the reply send fails, so long commit-lock waits or slow commits can still persist changes for a disconnected client despite this change's rollback-on-disconnect behavior.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case we are addressing in a followup, I think it's fine.

@tylerkaraszewski

Copy link
Copy Markdown
Contributor Author

@danieldoglas danieldoglas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question. Not sure if we need to discuss that here or in the slack.

Comment thread BedrockCommand.cpp
Comment on lines 126 to +149
@@ -140,7 +146,7 @@ void BedrockCommand::_waitForHTTPSRequests()
requestIt++;
}

// Timed everything out, can return.
// Timed everything out (or abandoned them), can return.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think these changes make sense, I'm kind of conflicted with aborting the command if we did HTTP requests. This could be a problem for cases like when we do billing, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean but there's no real guarantee these succeed anyway. They can time out and fail after an HTTPS request anyway, or we can send a HTTPS request and get disconnected from the remote server while receiving the response. Billing is designed to be resilient to failures anyway, so if we start abandoning commands mid-billing, we could re-run it, but it doesn't actually affect normal billing because those commands are fire-and-forget and are exempt to being cancelled.

@danieldoglas danieldoglas merged commit 95205e6 into main Jun 12, 2026
8 checks passed
@danieldoglas danieldoglas deleted the tyler-more-disconnect-abort-checks branch June 12, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants