-
Notifications
You must be signed in to change notification settings - Fork 52
Fix duplicate channel.close frames when client crashes #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add two specs that verify the proxy does not send duplicate Channel::Close frames to upstream when clients crash: 1. Client-initiated close: Opens multiple channels, sends Channel::Close for all of them, then crashes before receiving CloseOk. Without the fix, this causes duplicate Channel::Close frames which triggers RabbitMQ to close the connection with "expected 'channel.open'" error. 2. Upstream-initiated close: Triggers a channel error (consume from non-existent queue), receives Channel::Close from upstream, then crashes before sending CloseOk. The proxy should have already sent CloseOk to upstream. These specs reproduce the issue reported where client crashes could cause the upstream connection to be closed, affecting all clients sharing that connection.
7d01825 to
59ee86f
Compare
When a client sends Channel::Close and then crashes before receiving CloseOk, the proxy would send a duplicate Channel::Close from close_all_upstream_channels, causing the upstream to close the connection with "expected 'channel.open'" error. Fix: Delete the channel from @channel_map immediately when forwarding Channel::Close, rather than waiting for CloseOk. This way, if the client crashes, close_all_upstream_channels won't find the channel and won't send a duplicate close. This is simpler than using nil as a sentinel value - just remove the channel from the map when it starts closing, regardless of which side initiated the close.
59ee86f to
212b5c6
Compare
carlhoerberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! Should we retype @channel_map = Hash(UInt16, UpstreamChannel?).new so that it cant be nil?
Was the nil value never used before either?
|
Specs are new to me [added to my "learn someday" list]. For now, I took the 2 specs that you added and ran them (with slight tweaks) as stand-alone functions. |
|
Proposed change to the specs:
|
The channel_map no longer stores nil values after the previous fix changed from assigning nil to using delete(). This simplifies the type from Hash(UInt16, UpstreamChannel?) to Hash(UInt16, UpstreamChannel) and removes the unnecessary .try call when closing channels.
Good catch, |
Added a canary connection in both channel.close specs that stays open throughout the test. If the bug exists (duplicate Channel::Close sent), the upstream connection would close, affecting ALL clients including the canary. Verifying the canary remains open proves the upstream connection didn't close due to duplicate frames. This addresses Chad's concern that checking upstream_connections.should eq 1 alone might not be meaningful if the proxy quickly reconnects.
Good idea! Both specs now open a second connection that stays alive throughout the test. If the bug exists and sends duplicate |
chadknutson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now!
Problem
When a client crashes (e.g., ctrl-c), the proxy can send duplicate
Channel::Closeframes to the upstream broker:Channel::Closefor a channelCloseOkclose_all_upstream_channelssends anotherChannel::Closewith reason "CLIENT_DISCONNECTED"The upstream broker sees
Channel::Closefor a channel that's already closing/closed, causing error:This closes the entire upstream connection, affecting all clients sharing that connection.
Evidence from investigation
tcpdump on upstream showed: "back to back Channel.Close reply=CLIENT_DISCONNECTED frames"
Reliable reproduction: rapidly open/close channels, then kill client with ctrl-c.
Solution
Delete the channel from
@channel_mapimmediately when forwardingChannel::Close, rather than waiting forCloseOk. This way, if the client crashes,close_all_upstream_channelswon't find the channel and won't send a duplicate close.This is simpler than using
nilas a sentinel value (as proposed in #233) - just remove the channel from the map when it starts closing, regardless of which side initiated the close.Changes
Channel::Closeinread_loop: delete from map and forward to upstreamChannel::CloseandChannel::CloseOkcases inwritemethod since both just delete from mapTest plan