Skip to content

Handle query info timeout and zero return response when all rollout crashes #7

@xenshinu

Description

@xenshinu
(TaskRunner pid=1741302) [2025-10-28T05:53:31Z ERROR src/handlers.rs:283] Stream error: error decoding response body
(TaskRunner pid=1741302) [2025-10-28T05:53:31Z WARN src/handlers.rs:382] Stream failed on instance http://192.168.10.192:40000 due to decode_error [ignore stream_abort], removing instance and attempting to continue with another instance
(TaskRunner pid=1741302) [2025-10-28T05:53:31Z ERROR src/handlers.rs:310] Failed to connect to http://192.168.10.5:40000: error sending request for url (http://192.168.10.5:40000/generate) (type: connection_error)
(TaskRunner pid=1741302) [2025-10-28T05:53:31Z WARN src/handlers.rs:382] Stream failed on instance http://192.168.10.5:40000 due to connection_error [ignore stream_abort], removing instance and attempting to continue with another instance
(TaskRunner pid=1741302) [2025-10-28T05:53:31Z ERROR src/handlers.rs:310] Failed to connect to http://192.168.10.195:40000: error sending request for url (http://192.168.10.195:40000/generate) (type: connection_error)
(TaskRunner pid=1741302) [2025-10-28T05:53:31Z WARN src/handlers.rs:382] Stream failed on instance http://192.168.10.195:40000 due to connection_error [ignore stream_abort], removing instance and attempting to continue with another instance
(TaskRunner pid=1741302) [2025-10-28T05:53:33Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:33Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:33Z ERROR src/handlers.rs:310] Failed to connect to http://192.168.10.249:40000: error sending request for url (http://192.168.10.249:40000/generate) (type: connection_error)
(TaskRunner pid=1741302) [2025-10-28T05:53:35Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:35Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:37Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:37Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:39Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:39Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:41Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:41Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:43Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:43Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:44Z INFO src/handlers.rs:77] Received registration request for instance: http://192.168.10.195:40000 (id: cba98d8c-38ce-4ea3-8c09-ade70da625b0)
(TaskRunner pid=1741302) [2025-10-28T05:53:44Z INFO src/instance_manager.rs:7] Starting health check for instance: http://192.168.10.195:40000
(TaskRunner pid=1741302) [2025-10-28T05:53:44Z INFO src/handlers.rs:77] Received registration request for instance: http://192.168.10.192:40000 (id: d2f99040-0c85-4049-8620-b7fb753ab0c9)
(TaskRunner pid=1741302) [2025-10-28T05:53:44Z INFO src/instance_manager.rs:7] Starting health check for instance: http://192.168.10.192:40000
(TaskRunner pid=1741302) [2025-10-28T05:53:44Z INFO src/handlers.rs:77] Received registration request for instance: http://192.168.10.5:40000 (id: 6404bece-b2a4-4582-92bf-031c50b9691e)
(TaskRunner pid=1741302) [2025-10-28T05:53:44Z INFO src/instance_manager.rs:7] Starting health check for instance: http://192.168.10.5:40000
(TaskRunner pid=1741302) [2025-10-28T05:53:45Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:45Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:45Z INFO src/handlers.rs:64] Instance already registered and ready: http://192.168.10.200:40000
(TaskRunner pid=1741302) [2025-10-28T05:53:45Z INFO src/handlers.rs:64] Instance already registered and ready: http://192.168.10.249:40000
(TaskRunner pid=1741302) [2025-10-28T05:53:47Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:47Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:49Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:49Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:51Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.200:40000/get_server_info!
(TaskRunner pid=1741302) [2025-10-28T05:53:51Z WARN src/instance_manager.rs:54] Timeout when query server info at http://192.168.10.249:40000/get_server_info!
(WorkerDict pid=12890, ip=192.168.10.13) INFO:2025-10-28 01:53:51,467:--- Received Batch #3 with 1 prompt_responses ---
(WorkerDict pid=12890, ip=192.168.10.13) INFO:2025-10-28 01:53:51,467:Number of responses 0 != self.sample_n=8, got responses=[]
(TaskRunner pid=1741302) 
(TaskRunner pid=1741302) 

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions