Skip to content

ELBE SOAP daemon hang #456

@tobiasjakobi-lr

Description

@tobiasjakobi-lr

Hello,

reporting a pesky problem here that is hard to reproduce. We begin to notice this around mid of 2025 for the first time.

The problem results in the SOAP daemon still appearing to be working superficially (control list_projects) returns something. But it doesn't actually do anything anymore, like building stuff.

Nothing in the logs that hint to what could be causing this. It happens randomly and sometimes it takes weeks for it to happen.

Hence I added these following lines to /usr/bin/elbe:

import signal
import faulthandler

faulthandler.register(signal.SIGUSR1)

I managed to get a backtrace the last time this happened.

Jan 08 15:39:04 elbe-daemon elbe[1758342]: Thread 0x00007f9bbcb306c0 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3/dist-packages/elbepack/log.py", line 211 in __run
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3/dist-packages/elbepack/log.py", line 199 in run
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/threading.py", line 1043 in _bootstrap_inner
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/threading.py", line 1014 in _bootstrap
Jan 08 15:39:04 elbe-daemon elbe[1758342]: Thread 0x00007f9bb7dff6c0 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3/dist-packages/elbepack/log.py", line 211 in __run
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3/dist-packages/elbepack/log.py", line 199 in run
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/threading.py", line 1043 in _bootstrap_inner
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/threading.py", line 1014 in _bootstrap
Jan 08 15:39:04 elbe-daemon elbe[1758342]: Thread 0x00007f9bbd4316c0 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/threading.py", line 359 in wait
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/queue.py", line 202 in get
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3/dist-packages/elbepack/asyncworker.py", line 298 in run
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/threading.py", line 1043 in _bootstrap_inner
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/threading.py", line 1014 in _bootstrap
Jan 08 15:39:04 elbe-daemon elbe[1758342]: Current thread 0x00007f9bc11d8100 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/selectors.py", line 398 in select
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3.13/socketserver.py", line 235 in serve_forever
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3/dist-packages/elbepack/commands/daemon.py", line 99 in run_command
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/lib/python3/dist-packages/elbepack/main.py", line 43 in main
Jan 08 15:39:04 elbe-daemon elbe[1758342]:   File "/usr/bin/elbe", line 38 in <module>

This is inside a trixie initvm.

root@elbe-daemon:~# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 13 (trixie)"
NAME="Debian GNU/Linux"
VERSION_ID="13"
VERSION="13 (trixie)"
VERSION_CODENAME=trixie
DEBIAN_VERSION_FULL=13.2
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@elbe-daemon:~# elbe --version
elbe v15.7

My naive guess would be that it's somehow related to how ELBE is performing logging?

The problem fixes itself by restarting the corresponding systemd unit. But it's an annoying problem, as it results in pipeline failures/timeouts and is nearly impossible to detect externally (like I said, elbe control list_projects still works fine).

Any advice on how to get to the bottom of this?

With best wishes,
Tobias

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions