Skip to content

Failure to get an initial Kafka connection should terminate or cause non-liveness #29

@solsson

Description

@solsson

If the kafka client fails to connect we currently get the following state

# curl localhost:8090/health/live
{
    "status": "UP",
    "checks": [
        {
            "name": "REST liveness",
            "status": "UP"
        }
    ]
# curl localhost:8090/health/ready
{
    "status": "DOWN",
    "checks": [
        {
            "name": "consume-loop",
            "status": "DOWN",
            "data": {
                "stage": "WaitingForKafkaConnection"
            }
        }
    ]
}

This service probably need to take a stance on the topic of https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html from a sidecar perspective.

The cause of the above state is

2019-09-28 09:02:40,402 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) At stage Initializing before infinite polls with consumer org.apache.kafka.clients.consumer.KafkaConsumer@7f77c83013f0
2019-09-28 09:02:42,063 WARN  [org.apa.kaf.cli.NetworkClient] (kafkaclient) [Consumer clientId=consumer-1, groupId=integrations-b86db879f-r42zr] Connection to node -1 (bootstrap.kafka/10.43.84.242:9092) could not be established. Broker may not be available.
2019-09-28 09:02:45,197 WARN  [org.apa.kaf.cli.NetworkClient] (kafkaclient) [Consumer clientId=consumer-1, groupId=integrations-b86db879f-r42zr] Connection to node -1 (bootstrap.kafka/10.43.84.242:9092) could not be established. Broker may not be available.
2019-09-28 09:02:45,402 ERROR [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) A Kafka timeout occured at stage WaitingForKafkaConnection: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

Exception in thread "kafkaclient" org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2019-09-28 09:02:45,402 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) Closing consumer ...
2019-09-28 09:02:45,407 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) Consumer closed at stage WaitingForKafkaConnection; Use liveness probes with /health for app termination
2019-09-30 11:26:24,917 ERROR [org.jbo.res.res.i18n] (executor-thread-1) RESTEASY002010: Failed to execute: javax.ws.rs.ServiceUnavailableException: Denied because cache isn't started yet, check /health for status
    at se.yolean.kafka.keyvalue.http.CacheResource.requireUpToDateCache(CacheResource.java:43)
    at se.yolean.kafka.keyvalue.http.CacheResource.keysJson(CacheResource.java:128)

And REST services respond 503

# curl --verbose localhost:8090/cache/v1/keys
*   Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8090 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8090 (#0)
> GET /cache/v1/keys HTTP/1.1
> Host: localhost:8090
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Connection: keep-alive
< Content-Length: 0
< Date: Mon, 30 Sep 2019 11:26:30 GMT
<
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions