Skip to content

Potential memory leak in 4.0.x #608

@eiabea

Description

@eiabea

Since upgrading to the new 4.0.x version in our k8s cluster we are seeing periodic OOM kills

Image

We limited the RAM to 640Mi

The logs seem fine, we have a pretty unstable connection to our rpc endpoint, could this be part of the issue?

2025-11-14 07:38:01 INFO     Retrying consensus uri /eth/v2/beacon/blocks/head, attempt 2
2025-11-14 07:38:03 INFO     Retrying consensus uri /eth/v2/beacon/blocks/head, attempt 3
2025-11-14 07:38:03 INFO     Retrying consensus uri /eth/v2/beacon/blocks/head, attempt 3
2025-11-14 07:38:05 ERROR    Cannot connect to host rpc.oca.at:443 ssl:default [Name or service not known]
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/connector.py", line 1532, in _create_direct_connection
    hosts = await self._resolve_host(host, port, traces=traces)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/connector.py", line 1148, in _resolve_host
    return await asyncio.shield(resolved_host_task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/connector.py", line 1179, in _resolve_host_with_throttle
    addrs = await self._resolver.resolve(host, port, family=self._family)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/resolver.py", line 40, in resolve
    infos = await self._loop.getaddrinfo(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 900, in getaddrinfo
    return await self.run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/socket.py", line 976, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/src/common/tasks.py", line 17, in run
    await self.process_block(interrupt_handler)
  File "/app/src/commands/start/base.py", line 126, in process_block
    await asyncio.gather(*subtasks)
  File "/app/src/validators/tasks.py", line 78, in process
    compounding_validators_balances = await fetch_compounding_validators_balances()
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/validators/consensus.py", line 33, in fetch_compounding_validators_balances
    consensus_validators = await fetch_consensus_validators(list(vault_public_keys), slot=slot)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/validators/consensus.py", line 61, in fetch_consensus_validators
    beacon_validators = await consensus_client.get_validators_by_ids(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/sw_utils/consensus.py", line 86, in get_validators_by_ids
    return await self._async_make_post_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/sw_utils/consensus.py", line 183, in _async_make_post_request
    raise error
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/sw_utils/consensus.py", line 174, in _async_make_post_request
    async with session.post(uri, json=data) as response:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 1488, in __aenter__
    self._resp: _RetType = await self._coro
                           ^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 770, in _request
    resp = await handler(req)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 725, in _connect_and_send_request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/connector.py", line 642, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/connector.py", line 1209, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/aiohttp/connector.py", line 1538, in _create_direct_connection
    raise ClientConnectorDNSError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorDNSError: Cannot connect to host rpc.oca.at:443 ssl:default [Name or service not known]
2025-11-14 07:39:12 INFO     Retrying execution method eth_getBalance, attempt 2
2025-11-14 07:39:12 INFO     Retrying execution method eth_blockNumber, attempt 2
2025-11-14 07:39:12 INFO     Retrying execution method eth_blockNumber, attempt 2
2025-11-14 07:39:14 INFO     Retrying execution method eth_getBalance, attempt 3
2025-11-14 07:39:14 INFO     Retrying execution method eth_blockNumber, attempt 3
2025-11-14 07:39:14 INFO     Retrying execution method eth_blockNumber, attempt 3
2025-11-14 07:39:18 INFO     Retrying execution method eth_getBalance, attempt 4
2025-11-14 07:39:18 INFO     Retrying execution method eth_blockNumber, attempt 4
2025-11-14 07:39:20 INFO     Retrying execution method eth_blockNumber, attempt 4
2025-11-14 08:06:49 INFO     Scanned 3 DepositEvent events, block 1618080 of 1618080
2025-11-14 08:26:01 INFO     Scanned 1 DepositEvent events, block 1618170 of 1618170
2025-11-14 08:26:48 INFO     Starting vault 0xdbdA1DF83c30ccd89b410cd6848EA15D2b3CE961 harvest
2025-11-14 08:26:49 INFO     Waiting for transaction 0x7ddaf4346abeb6e280751d1fc35dc15e424c58508b618eba4294f2c79ce598f7 confirmation
2025-11-14 08:27:02 INFO     Successfully harvested
2025-11-14 08:51:39 INFO     Scanned 1 DepositEvent events, block 1618288 of 1618288
2025-11-14 08:58:03 INFO     Scanned 1 DepositEvent events, block 1618318 of 1618318

Docker Image: v4.0.11
Parameters:

      start-hashi-vault
      --network
      hoodi
      --hashi-vault-url
      http://vault:8200
      --hashi-vault-token
      $(HASHICORP_VAULT_TOKEN)
      --hashi-vault-key-path
      stakewise

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions