Skip to content

SetStorageTargetInfo deserialization fails for "Invalid value 224 for conversion into enum TargetConsistencyState" #2

@iamjoemccormick

Description

@iamjoemccormick

On a system with meta and storage buddy mirroring (probably relevant) and quotas (probably less relevant) if I start my management after it was stopped for sometime (for example after rebuilding), I typically see these errors logged:

[2024-12-03T18:23:16Z ERROR] Closed stream from 127.0.0.1:39064: target::SetStorageTargetInfo (2099) from 127.0.0.1:39064: BeeMsg body deserialization failed: Invalid value 224 for conversion into enum TargetConsistencyState
[2024-12-03T18:23:16Z ERROR] Closed stream from 127.0.0.1:39080: target::SetStorageTargetInfo (2099) from 127.0.0.1:39080: BeeMsg body deserialization failed: Invalid value 224 for conversion into enum TargetConsistencyState
[2024-12-03T18:23:27Z ERROR] Closed stream from 127.0.0.1:38210: target::SetStorageTargetInfo (2099) from 127.0.0.1:38210: BeeMsg body deserialization failed: Invalid value 224 for conversion into enum TargetConsistencyState
[2024-12-03T18:23:27Z ERROR] Closed stream from 127.0.0.1:38216: target::SetStorageTargetInfo (2099) from 127.0.0.1:38216: BeeMsg body deserialization failed: Invalid value 224 for conversion into enum TargetConsistencyState

At the same time at debug level one of my metas logs:

(2) 18:23:10.316062 XNodeSync [MessagingTk.cpp:447] >> Unable to connect, is the node offline? node: beegfs-mgmtd management [ID: 1]; Message type: GetStatesAndBuddyGroups (1053)                                                                                                                                            
(4) 18:23:10.316433 XNodeSync [TargetStateStore.h:60] >> Setting all states. New state: Probably-offline; Called from: meta/build/beegfs-meta(_ZN15InternodeSyncer41downloadAndSyncTargetStatesAndBuddyGroupsEv+0x27c) [0xaaaaac2b5d90]; meta/build/beegfs-meta(_ZN15InternodeSyncer8syncLoopEv+0x420) [0xaaaaac2bb748]; meta/
build/beegfs-meta(_ZN15InternodeSyncer3runEv+0x1c) [0xaaaaac2bbbb4].                                                                                                                                                                                                                                                          
(4) 18:23:16.311707 XNodeSync [NodeConn (acquire stream)] >> Establishing new TCP connection to: beegfs-mgmtd@127.0.0.1:8008                                                                                                                                                                                                  
(4) 18:23:16.311869 XNodeSync [StandardSocket.cpp:142] >> Connect StandardSocket socket: 0xaaaadfa12a90; addr: 127.0.0.1:8008; bindIP: 0.0.0.0                                                                                                                                                                                
(3) 18:23:16.312104 XNodeSync [NodeConn (acquire stream)] >> Connected: beegfs-mgmtd@127.0.0.1:8008 (protocol: TCP)                                                                                                                                                                                                           
(0) 18:23:16.313482 XNodeSync [Messaging (RPC)] >> Communication error: Soft disconnect from 127.0.0.1:8008; Peer: beegfs-mgmtd management [ID: 1]. (Message type: SetStorageTargetInfo (2099))                                                                                                                               
(4) 18:23:16.313584 XNodeSync [NodeConn (invalidate stream)] >> Disconnected: beegfs-mgmtd@127.0.0.1:8008                                                                                                                                                                                                                     
(2) 18:23:16.313612 XNodeSync [MessagingTk.cpp:25] >> Retrying communication. peer: beegfs-mgmtd management [ID: 1]; message type: SetStorageTargetInfo (2099)                                                                                                                                                                
(4) 18:23:16.313620 XNodeSync [NodeConn (acquire stream)] >> Establishing new TCP connection to: beegfs-mgmtd@127.0.0.1:8008                                                                                                                                                                                                  
(4) 18:23:16.313633 XNodeSync [StandardSocket.cpp:142] >> Connect StandardSocket socket: 0xaaaadfa12a90; addr: 127.0.0.1:8008; bindIP: 0.0.0.0                                                                                                                                                                                
(3) 18:23:16.313690 XNodeSync [NodeConn (acquire stream)] >> Connected: beegfs-mgmtd@127.0.0.1:8008 (protocol: TCP)                                                                                                                                                                                                           
(0) 18:23:16.314081 XNodeSync [Messaging (RPC)] >> Communication error: Soft disconnect from 127.0.0.1:8008; Peer: beegfs-mgmtd management [ID: 1]. (Message type: SetStorageTargetInfo (2099))                                                                                                                               
(4) 18:23:16.314142 XNodeSync [NodeConn (invalidate stream)] >> Disconnected: beegfs-mgmtd@127.0.0.1:8008    

Because I have buddy mirroring enabled (meta and storage mirrors) my initial theory was that something with that TargetConsistencyState enum was not setup correctly to deserialize states other than Good (I'm guessing my targets are in needs resync). But as best I can tell all the serialization for SetStorageTargetInfo, TargetInfo and TargetConsistencyState is correct so I'm not sure where that 224 value is coming from.

Also entirely possible this is a longstanding bug in the meta/storage services.

Perhaps notable we have only observed this on aarch64 (Linux beemac 5.15.0-125-generic ThinkParQ/beegfs-rs#135-Ubuntu SMP Fri Sep 27 13:56:10 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux). If we get additional reports we should check if the issue consistently only happens on arm systems.

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions