What We Observed
We've been experiencing recurring NodeNotReady incidents caused by EFS NFS mounts becoming stuck after the efs-proxy → EFS TCP connection goes silent. Across multiple incidents, we consistently noticed a ~2-minute gap between the start of TCP retransmits on the efs-proxy → EFS connection and complete NFS session collapse.
Environment
- efs-utils: 3.1.1
- aws-efs-csi-driver: v3.1.0 (EKS managed add-on)
- Kubernetes: v1.34 (Amazon EKS, ap-northeast-2)
- Node: c7i.8xlarge
What We Noticed in the Source
Looking at configure_stream() in src/proxy/src/connections.rs, we noticed it only sets TCP_NODELAY:
pub fn configure_stream(tcp_stream: TcpStream) -> TcpStream {
match tcp_stream.set_nodelay(true) {
Ok(()) => {}
Err(e) => { warn!("failed to set TCP_NODELAY: {:?}", e); }
}
tcp_stream
// SO_KEEPALIVE and TCP_USER_TIMEOUT do not appear to be set here
}
Cargo.toml includes tokio = { features = ["full"] } and libc, which we understand provide the APIs needed to configure both options — so we're wondering if this was an intentional design choice, or something that could be added.
Our Interpretation
Our understanding (happy to be corrected):
- Without
TCP_USER_TIMEOUT, when the TCP path becomes unresponsive during active NFS I/O, the kernel falls back to RTO exponential backoff before declaring the connection dead. Based on default Linux settings, this takes approximately 2 minutes — consistent with what we observed.
SO_KEEPALIVE would help detect dead connections during idle periods, but wouldn't fire when NFS I/O is actively in flight, which is our typical scenario.
- NFS v4.1 session lease is ~90 seconds (RFC 5661 default). If detection takes ~2 minutes, efs-proxy reconnects after the session has already expired on the EFS side, making session recovery impossible.
- Setting
TCP_USER_TIMEOUT to a value shorter than the NFS v4.1 lease (e.g., 25s) might allow efs-proxy to detect and reconnect within the lease window, enabling transparent session recovery.
We recognize this is our interpretation based on observed timing. If the 2-minute behavior is intentional, or if there's something else in the connection lifecycle we're missing, we'd appreciate the clarification.
Question for Maintainers
Is the absence of TCP_USER_TIMEOUT (and SO_KEEPALIVE) on the EFS-side socket intentional? If so, is there another mechanism that handles dead connection detection within the NFS v4.1 lease window?
Related
What We Observed
We've been experiencing recurring NodeNotReady incidents caused by EFS NFS mounts becoming stuck after the efs-proxy → EFS TCP connection goes silent. Across multiple incidents, we consistently noticed a ~2-minute gap between the start of TCP retransmits on the efs-proxy → EFS connection and complete NFS session collapse.
Environment
What We Noticed in the Source
Looking at
configure_stream()insrc/proxy/src/connections.rs, we noticed it only setsTCP_NODELAY:Cargo.tomlincludestokio = { features = ["full"] }andlibc, which we understand provide the APIs needed to configure both options — so we're wondering if this was an intentional design choice, or something that could be added.Our Interpretation
Our understanding (happy to be corrected):
TCP_USER_TIMEOUT, when the TCP path becomes unresponsive during active NFS I/O, the kernel falls back to RTO exponential backoff before declaring the connection dead. Based on default Linux settings, this takes approximately 2 minutes — consistent with what we observed.SO_KEEPALIVEwould help detect dead connections during idle periods, but wouldn't fire when NFS I/O is actively in flight, which is our typical scenario.TCP_USER_TIMEOUTto a value shorter than the NFS v4.1 lease (e.g., 25s) might allow efs-proxy to detect and reconnect within the lease window, enabling transparent session recovery.We recognize this is our interpretation based on observed timing. If the 2-minute behavior is intentional, or if there's something else in the connection lifecycle we're missing, we'd appreciate the clarification.
Question for Maintainers
Is the absence of
TCP_USER_TIMEOUT(andSO_KEEPALIVE) on the EFS-side socket intentional? If so, is there another mechanism that handles dead connection detection within the NFS v4.1 lease window?Related