-
Notifications
You must be signed in to change notification settings - Fork 70
Description
Hello,
reporting a pesky problem here that is hard to reproduce. We begin to notice this around mid of 2025 for the first time.
The problem results in the SOAP daemon still appearing to be working superficially (control list_projects) returns something. But it doesn't actually do anything anymore, like building stuff.
Nothing in the logs that hint to what could be causing this. It happens randomly and sometimes it takes weeks for it to happen.
Hence I added these following lines to /usr/bin/elbe:
import signal
import faulthandler
faulthandler.register(signal.SIGUSR1)
I managed to get a backtrace the last time this happened.
Jan 08 15:39:04 elbe-daemon elbe[1758342]: Thread 0x00007f9bbcb306c0 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3/dist-packages/elbepack/log.py", line 211 in __run
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3/dist-packages/elbepack/log.py", line 199 in run
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/threading.py", line 1043 in _bootstrap_inner
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/threading.py", line 1014 in _bootstrap
Jan 08 15:39:04 elbe-daemon elbe[1758342]: Thread 0x00007f9bb7dff6c0 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3/dist-packages/elbepack/log.py", line 211 in __run
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3/dist-packages/elbepack/log.py", line 199 in run
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/threading.py", line 1043 in _bootstrap_inner
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/threading.py", line 1014 in _bootstrap
Jan 08 15:39:04 elbe-daemon elbe[1758342]: Thread 0x00007f9bbd4316c0 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/threading.py", line 359 in wait
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/queue.py", line 202 in get
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3/dist-packages/elbepack/asyncworker.py", line 298 in run
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/threading.py", line 1043 in _bootstrap_inner
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/threading.py", line 1014 in _bootstrap
Jan 08 15:39:04 elbe-daemon elbe[1758342]: Current thread 0x00007f9bc11d8100 (most recent call first):
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/selectors.py", line 398 in select
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3.13/socketserver.py", line 235 in serve_forever
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3/dist-packages/elbepack/commands/daemon.py", line 99 in run_command
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/lib/python3/dist-packages/elbepack/main.py", line 43 in main
Jan 08 15:39:04 elbe-daemon elbe[1758342]: File "/usr/bin/elbe", line 38 in <module>
This is inside a trixie initvm.
root@elbe-daemon:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 13 (trixie)"
NAME="Debian GNU/Linux"
VERSION_ID="13"
VERSION="13 (trixie)"
VERSION_CODENAME=trixie
DEBIAN_VERSION_FULL=13.2
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@elbe-daemon:~# elbe --version
elbe v15.7
My naive guess would be that it's somehow related to how ELBE is performing logging?
The problem fixes itself by restarting the corresponding systemd unit. But it's an annoying problem, as it results in pipeline failures/timeouts and is nearly impossible to detect externally (like I said, elbe control list_projects still works fine).
Any advice on how to get to the bottom of this?
With best wishes,
Tobias