Skip to content

Handle server disconnection#263

Open
byroot wants to merge 1 commit intotrilogy-libraries:mainfrom
byroot:timeout-desync
Open

Handle server disconnection#263
byroot wants to merge 1 commit intotrilogy-libraries:mainfrom
byroot:timeout-desync

Conversation

@byroot
Copy link
Collaborator

@byroot byroot commented Jan 29, 2026

Fix: #257

When the server triggers a disconnection, we receive a packet with sequence 0. Hence we can't be so strict about sequence ordering.

return i;
}

parser->sequence_number++;
Copy link
Contributor

@HeyNonster HeyNonster Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good! I have one question though:

Is there any scenario where we would want to continue reading responses after getting a 0 response? I assume any 0 response that's unexpected will be an error/closed connection but I'm not positive.

5	0	Server Greeting  proto=10 version=8.0.44 
7	1	Login Request user=root 
9	2	Response  OK 
11	0	Request Query
13	1	Response  OK 
15	0	Response  Error 4031 <-- end here, connection closed.
19	0	Request Query

Because I think if we increment here then the parser would next expect a sequence number of 2 but we haven't gotten 1 back yet. It's fine in this scenario because we'll never get a 1 back anyway.

It's likely I'm overthinking this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're not wrong. I think my test should at least assert the state of the connection after the error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum. I added a test, but the behavior seem quite inconsistent across mysql implementations and platforms :/

Fix: trilogy-libraries#257

When the server triggers a disconnection, we receive a packet
with sequence 0. Hence we can't be so strict about sequence
ordering.
Comment on lines +1034 to +1040
assert_includes ex.message, "The client was disconnected by the server because of inactivity"
assert_equal 4031, ex.error_code
end

assert_raises Trilogy::EOFError do
client.query("SELECT 1")
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@composerinteralia if the server is sending back a 4031 here should Trilogy know the connection is closed and then return a ConnectionClosed error instead of an EOFError?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes, in practice it's way trickier to implement than it sounds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to find any mysql documentation that details all of the possible out-of-sequence messages and I've come up blank. 😞

Copy link
Collaborator

@composerinteralia composerinteralia Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's ideal but probably not so easy to do. It's also a bit awkward that trilogy_error_recoverable_p will return true for this case, but we'd find out quickly on the next attempt that the connection is closed, so 🤷🏻.

I guess the only worry would be if there are any server-initiated communications that don't also close the connection immediately. That'd leave us in a bad state. But it might not be possible. I'm not too sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any server-initiated communications that don't also close the connection immediately

Yeah, this is what I was trying to find documentation for but found nada 🤷

Copy link

@Thomascountz Thomascountz Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In libmysqlclient, when it expects a pkt_nr of 1, but it gets 0, it marks the socket as unusable and proceeds with parsing, since "The caller should handle the error code in the packet..."

https://github.com/mysql/mysql-server/blob/447eb26e094b444a88c532028647e48228c3c04f/sql-common/net_serv.cc#L1518-L1560

The logic is:

  1. If the sequence isn't what we expect: if (pkt_nr != (uchar)net->pkt_nr)
  2. And what we expected was a 1: if (net->pkt_nr == 1)
  3. Mark the socket as permanently dead: net->error = NET_ERROR_SOCKET_UNUSABLE;
  4. Record that we detected an out-of-order packet: net->last_errno = ER_NET_PACKETS_OUT_OF_ORDER;
  5. Update our expected sequence number to be 1: net->pkt_nr = pkt_nr + 1;
  6. Return false meaning "no error reading the header" and continue parsing: return false

So, I'm thinking we can do the same. We can assume the socket is closed in this specific case (expecting 1, got 0) and continue parsing/handling the error message. I think it's possible there are other 0-sequence out-of-order interruptions from the server, but this is the only "documented" case I could find.

}
case S_SEQ: {
if (cur_byte != parser->sequence_number) {
if (cur_byte != parser->sequence_number && cur_byte > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some vague concerns that this could mask some legitimate error case, but I couldn't think of any such cases so maybe it's fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it felt a bit bad writtting this, but I have no better idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] TRILOGY_INVALID_SEQUENCE_ID after exceeding a session wait_timeout

4 participants