Skip to content

Free listening port earlier in shutdown process#2625

Open
fukawi2 wants to merge 1 commit into
mainfrom
phs/freePortEarlier
Open

Free listening port earlier in shutdown process#2625
fukawi2 wants to merge 1 commit into
mainfrom
phs/freePortEarlier

Conversation

@fukawi2

@fukawi2 fukawi2 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Close the SQLiteNode peer listening port (-nodeHost) early during shutdown so the kernel releases it before the database memory is unmapped, instead of holding it bound for the duration of the unmap().

When a node shuts down with a large memory-mapped database, the kernel keeps the peer listening port bound while it unmaps that memory, which can take a long time and prevents a new process starting.

The port was held because ~SQLiteNode deletes pluginDB (a separate handle whose sqlite3_close() unmaps its pages) in the destructor body, while _port is a member only destroyed after the body returns. This closes _port before delete pluginDB in the destructor, and also frees it in beginShutdown() so it's released early and deterministically on the graceful path, with update() guarded so it won't reopen the port while shutting down.

Fixed Issues

Fixes https://expensify.slack.com/archives/C094TGUTZ/p1780360333816159?thread_ts=1780356276.321669&cid=C094TGUTZ

Tests

image image

@fukawi2 fukawi2 requested a review from tylerkaraszewski June 2, 2026 03:43
@fukawi2 fukawi2 self-assigned this Jun 2, 2026

@tylerkaraszewski tylerkaraszewski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the diagnosis of this issue is wrong, or at least incomplete.

// listening socket), and the database teardown that follows can take a long time when the memory-mapped
// database is large. Until the FD is closed, the kernel keeps the listening port bound while it unmaps that
// memory, which prevents a restarted node from rebinding the port.
_port = nullptr;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is broken, we shouldn't free this until cluater communication is done.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster communication should be being done over established connections though? This is only closing the listening port for new connections?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing connections are not guaranteed to stay open.

// it unmaps that handle's memory-mapped pages, which can take a long time when the database is large. _port is a
// member and would otherwise only be destroyed after this destructor body returns, leaving the kernel holding the
// listening port bound for the duration of that unmap.
_port = nullptr;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything. Well, it does what it says, which is to move the port freeing to before deleting pluginDB, but deleting pluginDB is not the slow part of this that you're worried about.

The show part of closing the DB is closing the last existing filehandle to it. This is owned by BedrockServer::_dbPool.

This currently gets deleted at the end of BedrockServer::sync(), but it's right after we delete the syncNode (by calling atomic_store for it and passing nullptr), which should release the peer port.

If the peer port is not released before the final DB deletion starts, it is probably because some other thread is holding a reference to _syncNode, but even that shouldn't be an issue, as that thread can release it while the deletion happens.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll be honest, I'm out of my depth. What I saw yesterday when bouncing the cluster was that port 4445 was the one that was held during unmmap() - but only on the hosts that had their command port closed when shutdown was initiated. I'm not sure if that really helps knowing though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have the time and date and host for when this happened so I can look at the logs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db2.lax Logs

db1.rno Logs

@tylerkaraszewski

Copy link
Copy Markdown
Contributor

Does this seem likely to be the log line for when db2.lax was shut down and this issue started on that node?

2026-06-02T00:27:35.987610+00:00 db2.lax bedrock: xxxxxx we@dont.know (BedrockServer.cpp:2159) _beginShutdown [main] [info] Beginning graceful shutdown due to 'Terminated', closing command port on '0.0.0.0:4444'.

@fukawi2

fukawi2 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

I can't be positive since it was in the middle of a long saltfab run, but it seems highly likely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants