feat: Configurable liveness/readiness probe settings per pod #60

eliBenven · 2026-04-02T16:20:13Z

eliBenven
Apr 2, 2026

Summary

Nexlayer applies default liveness/readiness probes that kill containers if they don't respond within ~15-30 seconds. Many legitimate workloads need significantly longer startup times. There's currently no way to configure probe behavior in the launchfile.

Proposed API

application:
  name: my-app
  pods:
    - name: game-server
      image: itzg/minecraft-server:latest
      servicePorts:
        - 25565
      probes:
        startupDelaySeconds: 120    # wait before first probe
        periodSeconds: 30           # time between probes
        failureThreshold: 10        # failures before kill
        # or disable entirely for non-HTTP workloads:
        # enabled: false

Use Cases

1. Game Servers

Minecraft Paper server needs 60-90s to remap classes and generate world chunks. The Isle and ARK servers can take 2-3 minutes. Current probes kill them in a CrashLoopBackOff cycle (observed in stress test: Minecraft hit 146 restart attempts).

2. Java/JVM Applications

Spring Boot apps with large classpaths, Elasticsearch with index recovery, Kafka brokers — all commonly need 30-120s startup time.

3. ML Model Loading

Ollama pulling and loading a large model, or any ML inference server loading weights into memory — can take minutes on first boot.

4. Database Recovery

Postgres with WAL replay, Elasticsearch with shard recovery, MongoDB with journal replay — startup time scales with data volume and is unpredictable.

5. Non-HTTP Workloads

Game servers, MQTT brokers, and other TCP/UDP services don't serve HTTP at all. HTTP-based liveness probes will always fail. Need either TCP probes, exec probes, or the ability to disable probes entirely.

Evidence from Stress Test

App	Restarts	Root Cause
Minecraft	146+	Server boots ~60s, killed by probe before ready
Ollama	stuck rolling	GPU model loading takes minutes

References

Slow-starting containers killed by liveness probe (Minecraft, heavy Java apps) #48 (slow-starting containers killed by liveness probe)
Discussion feat: Raw TCP/UDP port exposure for non-HTTP workloads #59 (raw TCP/UDP exposure — non-HTTP workloads can't respond to HTTP probes)

Migrated from #57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NexlayerHQ

feat: Configurable liveness/readiness probe settings per pod #60

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

NexlayerHQ

feat: Configurable liveness/readiness probe settings per pod #60

Uh oh!

eliBenven Apr 2, 2026

Summary

Proposed API

Use Cases

1. Game Servers

2. Java/JVM Applications

3. ML Model Loading

4. Database Recovery

5. Non-HTTP Workloads

Evidence from Stress Test

References

Replies: 0 comments

eliBenven
Apr 2, 2026