Free listening port earlier in shutdown process by fukawi2 · Pull Request #2625 · Expensify/Bedrock

fukawi2 · 2026-06-02T03:43:42Z

Close the SQLiteNode peer listening port (-nodeHost) early during shutdown so the kernel releases it before the database memory is unmapped, instead of holding it bound for the duration of the unmap().

When a node shuts down with a large memory-mapped database, the kernel keeps the peer listening port bound while it unmaps that memory, which can take a long time and prevents a new process starting.

The port was held because ~SQLiteNode deletes pluginDB (a separate handle whose sqlite3_close() unmaps its pages) in the destructor body, while _port is a member only destroyed after the body returns. This closes _port before delete pluginDB in the destructor, and also frees it in beginShutdown() so it's released early and deterministically on the graceful path, with update() guarded so it won't reopen the port while shutting down.

Fixed Issues

Fixes https://expensify.slack.com/archives/C094TGUTZ/p1780360333816159?thread_ts=1780356276.321669&cid=C094TGUTZ

Tests

tylerkaraszewski

I think the diagnosis of this issue is wrong, or at least incomplete.

tylerkaraszewski · 2026-06-02T22:03:52Z

+        // listening socket), and the database teardown that follows can take a long time when the memory-mapped
+        // database is large. Until the FD is closed, the kernel keeps the listening port bound while it unmaps that
+        // memory, which prevents a restarted node from rebinding the port.
+        _port = nullptr;


This is broken, we shouldn't free this until cluater communication is done.

Cluster communication should be being done over established connections though? This is only closing the listening port for new connections?

Existing connections are not guaranteed to stay open.

tylerkaraszewski · 2026-06-02T22:15:43Z

+    // it unmaps that handle's memory-mapped pages, which can take a long time when the database is large. _port is a
+    // member and would otherwise only be destroyed after this destructor body returns, leaving the kernel holding the
+    // listening port bound for the duration of that unmap.
+    _port = nullptr;


This doesn't do anything. Well, it does what it says, which is to move the port freeing to before deleting pluginDB, but deleting pluginDB is not the slow part of this that you're worried about.

The show part of closing the DB is closing the last existing filehandle to it. This is owned by BedrockServer::_dbPool.

This currently gets deleted at the end of BedrockServer::sync(), but it's right after we delete the syncNode (by calling atomic_store for it and passing nullptr), which should release the peer port.

If the peer port is not released before the final DB deletion starts, it is probably because some other thread is holding a reference to _syncNode, but even that shouldn't be an issue, as that thread can release it while the deletion happens.

Yeah I'll be honest, I'm out of my depth. What I saw yesterday when bouncing the cluster was that port 4445 was the one that was held during unmmap() - but only on the hosts that had their command port closed when shutdown was initiated. I'm not sure if that really helps knowing though.

Do you have the time and date and host for when this happened so I can look at the logs?

db2.lax Logs

db1.rno Logs

tylerkaraszewski · 2026-06-04T17:28:31Z

Does this seem likely to be the log line for when db2.lax was shut down and this issue started on that node?

2026-06-02T00:27:35.987610+00:00 db2.lax bedrock: xxxxxx we@dont.know (BedrockServer.cpp:2159) _beginShutdown [main] [info] Beginning graceful shutdown due to 'Terminated', closing command port on '0.0.0.0:4444'.

fukawi2 · 2026-06-04T22:44:59Z

I can't be positive since it was in the middle of a long saltfab run, but it seems highly likely.

Free listening port earlier in shutdown process

65e0daf

fukawi2 requested a review from tylerkaraszewski June 2, 2026 03:43

fukawi2 self-assigned this Jun 2, 2026

tylerkaraszewski requested changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Free listening port earlier in shutdown process#2625

Free listening port earlier in shutdown process#2625
fukawi2 wants to merge 1 commit into
mainfrom
phs/freePortEarlier

fukawi2 commented Jun 2, 2026

Uh oh!

tylerkaraszewski left a comment

Uh oh!

tylerkaraszewski Jun 2, 2026

Uh oh!

fukawi2 Jun 2, 2026

Uh oh!

tylerkaraszewski Jun 3, 2026

Uh oh!

tylerkaraszewski Jun 2, 2026

Uh oh!

fukawi2 Jun 2, 2026

Uh oh!

tylerkaraszewski Jun 3, 2026

Uh oh!

fukawi2 Jun 4, 2026

Uh oh!

tylerkaraszewski commented Jun 4, 2026

Uh oh!

fukawi2 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fukawi2 commented Jun 2, 2026

Fixed Issues

Tests

Uh oh!

tylerkaraszewski left a comment

Choose a reason for hiding this comment

Uh oh!

tylerkaraszewski Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

fukawi2 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

tylerkaraszewski Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

tylerkaraszewski Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

fukawi2 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

tylerkaraszewski Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

fukawi2 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

tylerkaraszewski commented Jun 4, 2026

Uh oh!

fukawi2 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants