OTS 1.6 EUD TLS Handler / RabbitMQ Cleanup Bug
Environment
- OpenTAKServer:
- RabbitMQ:
- Local instance on
127.0.0.1:5672, default guest/guest, cot, dms, and groups exchanges.
- Clients:
- iTAK 2.12.3.651 EUDs (multiple usernames).
- Network:
- OTS HTTP:
http://<vm>:8081.
- EUD TCP:
0.0.0.0:8088.
- EUD TLS:
0.0.0.0:8089.
Symptom
After some time running under normal load, the EUD TLS handler stops accepting device logins reliably:
- iTAK clients that have been connecting for days with the same settings suddenly fail to log in.
- The OTS HTTP UI remains healthy:
GET /api/login returns the login page.
POST /api/login with admin credentials works (302 redirect).
/api/eud returns valid JSON and shows existing EUD entries (e.g., magnum6, red1).
- From the OTS host:
- The EUD TLS port 8089 still appears as LISTEN in
ss -tulpn.
- Before restart,
openssl s_client -connect 127.0.0.1:8089 often hangs or fails; after restart, it shows a valid TLS handshake and the opentakserver cert.
- Devices require no configuration change; they automatically reconnect once OTS is restarted.
Logs and Errors Observed
1. EUD handler cleanup exceptions (KeyError: 'queue')
In /tmp/ots16_eud_handler_ssl.log we see repeated exceptions like:
Exception in thread Thread-XXX:
Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File ".../opentakserver/eud_handler/client_controller.py", line 266, in run
self.close_connection()
File ".../opentakserver/eud_handler/client_controller.py", line 309, in close_connection
self.unbind_rabbitmq_queues()
File ".../opentakserver/eud_handler/client_controller.py", line 536, in unbind_rabbitmq_queues
self.rabbit_channel.queue_unbind(exchange=bind['exchange'], queue=bind['queue'], routing_key=bind['routing_key'])
~~~~^^^^^^^^^
KeyError: 'queue'
This indicates that during disconnect, ClientController.unbind_rabbitmq_queues() is iterating self.bound_queues and encountering entries without a 'queue' key.
2. EUD handler socket warnings and OSErrors
In ots15/logs/opentakserver.log.2025-12-04, we see:
[2025-12-03 20:28:24,659] - eud_handler[...] - client_controller - run - 265 - WARNING - No Data Closing connection to 127.0.0.1
[2025-12-03 20:28:24,692] - eud_handler[...] - client_controller - unbind_rabbitmq_queues - 535 - INFO - Unbinding {'exchange': 'groups', 'routing_key': '__ANON__.OUT'}
...
[2025-12-03 21:11:53,579] - eud_handler[...] - client_controller - on_message - 152 - ERROR - red1: EOF occurred in violation of protocol (_ssl.c:2406), closing socket
...
[2025-12-03 17:49:47,625] - eud_handler[...] - client_controller - run - 265 - WARNING - No Data Closing connection to 127.0.0.1
[2025-12-03 17:49:47,642] - eud_handler[...] - client_controller - send_disconnect_cot - 566 - INFO - 127.0.0.1 disconnected
These appear around the times when TLS EUD connections (including headless clients) disconnect. They often precede or accompany the KeyError: 'queue' exceptions above.
3. TLS 8089 behavior before vs after restart
Before restarting the EUD handlers:
nc -vz 40.76.106.80 8089 and openssl s_client -connect 40.76.106.80:8089 from a remote host frequently timed out.
- iTAK clients could no longer complete a TLS login.
After restarting:
openssl s_client -connect 127.0.0.1:8089 -servername opentakserver -brief </dev/null
produces:
depth=1 C = WW, ST = XX, L = YY, O = ZZ, OU = OpenTAKServer, CN = OpenTAKServer-CA
verify error:num=19:self-signed certificate in certificate chain
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Ciphersuite: TLS_AES_256_GCM_SHA384
Peer certificate: C = WW, ST = XX, L = YY, O = ZZ, OU = OpenTAKServer, CN = opentakserver
...
and the devices reconnect without any configuration change.
Code Analysis – bound_queues and Cleanup Bug
The relevant code in opentakserver/eud_handler/client_controller.py (OTS 1.6.0) looks like:
class ClientController(Thread):
def __init__(...):
...
self.bound_queues = []
...
if self.is_ssl:
try:
self.sock.do_handshake()
...
except BaseException as e:
logger.warning("Failed to do handshake: {}".format(e))
self.logger.error(traceback.format_exc())
self.close_connection()
...
def unbind_rabbitmq_queues(self):
if self.uid and self.rabbit_channel and not self.rabbit_channel.is_closing and not self.rabbit_channel.is_closed:
self.rabbit_channel.queue_unbind(queue=self.uid, exchange="missions", routing_key="missions")
self.rabbit_channel.queue_unbind(queue=self.uid, exchange="groups")
with self.app.app_context():
missions = db.session.execute(db.session.query(Mission)).all()
for mission in missions:
self.rabbit_channel.queue_unbind(queue=self.uid, exchange="missions", routing_key=f"missions.{mission[0].name}")
self.logger.debug(f"Unbound {self.uid} from mission.{mission[0].name}")
for bind in self.bound_queues:
self.logger.info(f"Unbinding {bind}")
self.rabbit_channel.queue_unbind(exchange=bind['exchange'], queue=bind['queue'], routing_key=bind['routing_key'])
And a key part of the setup path where bound_queues is populated:
with self.app.app_context():
group_memberships = db.session.execute(
db.session.query(GroupUser).filter_by(user_id=self.user.id, direction=Group.OUT)
).all()
if not group_memberships or not self.is_ssl:
self.logger.debug(f"{self.callsign} doesn't belong to any groups, adding them to the __ANON__ group")
self.rabbit_channel.queue_bind(exchange="groups", queue=self.uid, routing_key="__ANON__.OUT")
if {"exchange": "groups", "routing_key": "__ANON__.OUT", "queue": self.uid} not in self.bound_queues:
# BUG: 'queue' key missing in the appended dict
self.bound_queues.append({"exchange": "groups", "routing_key": "__ANON__.OUT"})
elif group_memberships and self.is_ssl:
for membership in group_memberships:
membership = membership[0]
self.rabbit_channel.queue_bind(exchange="groups", queue=self.uid, routing_key=f"{membership.group.name}.OUT")
if {"exchange": "groups", "routing_key": f"{membership.group.name}.OUT", "queue": self.uid} not in self.bound_queues:
self.bound_queues.append({"exchange": "groups", "routing_key": f"{membership.group.name}.OUT", "queue": self.uid})
self.rabbit_channel.queue_bind(exchange='missions', routing_key="missions", queue=self.uid)
if {"exchange": "missions", "routing_key": "missions", "queue": self.uid} not in self.bound_queues:
# BUG: appends wrong exchange and then assumes 'queue' exists
self.bound_queues.append({"exchange": "groups", "routing_key": "__ANON__.OUT", "queue": self.uid})
# DM bindings:
self.rabbit_channel.queue_bind(exchange='dms', queue=self.uid, routing_key=self.uid)
self.rabbit_channel.queue_bind(exchange='dms', queue=self.callsign, routing_key=self.callsign)
if {"exchange": "dms", "routing_key": self.uid, "queue": self.uid} not in self.bound_queues:
self.bound_queues.append({"exchange": "dms", "routing_key": self.uid, "queue": self.uid})
if {"exchange": "dms", "routing_key": self.callsign, "queue": self.callsign} not in self.bound_queues:
self.bound_queues.append({"exchange": "dms", "routing_key": self.callsign, "queue": self.callsign})
Problems we see:
- In the
not group_memberships or not self.is_ssl branch, the dict appended to bound_queues is missing 'queue'.
- In the missions binding section,
bound_queues.append({"exchange": "groups", ...}) is called inside a guard for {"exchange": "missions", ...} – likely a copy/paste error.
unbind_rabbitmq_queues() assumes all entries in bound_queues have a 'queue' key and raises KeyError when it hits the malformed entry.
Trigger Pattern
We are not doing anything exotic; the pattern that triggers this is:
- Create users via
/api/user/add (e.g., "S3 STAFF", "NOMAD 4", magnum6, red1, agent).
- Do not assign OUT groups to those users (no
GroupUser rows with direction=Group.OUT for them).
- Have iTAK devices log in via TLS on 8089 as those users and then disconnect later (normal behavior: app closed, idle timeout, Wi-Fi flaps).
- On disconnect,
ClientController.close_connection() calls unbind_rabbitmq_queues(), which iterates self.bound_queues and hits the malformed entries, throwing KeyError: 'queue'.
- Over time, enough handler threads crash that the EUD TLS handler is effectively broken; new TLS connections to 8089 start timing out or failing until OTS is restarted.
The issue is driven by normal TLS EUD connect/disconnect cycles for users with no OUT group memberships.
Proposed fix
- Ensure all entries appended to
self.bound_queues include a 'queue' key.
- Correct the missions block to append a
{"exchange": "missions", ...} entry instead of a groups entry.
- Make
unbind_rabbitmq_queues() resilient to malformed entries (e.g., skip entries missing 'queue' or log and continue), so a bad dict does not crash the handler.
OTS 1.6 EUD TLS Handler / RabbitMQ Cleanup Bug
We’re seeing the EUD TLS handler in OTS 1.6.0 become unstable over time: TLS logins on 8089 for all iTAK devices suddenly start failing until OTS is restarted. The underlying cause appears to be a bug in ClientController.unbind_rabbitmq_queues when cleaning up RabbitMQ bindings.
Environment
- OpenTAKServer:
- RabbitMQ:
127.0.0.1:5672, exchanges cot, dms, and groups.
- Network:
- HTTP/REST:
http://<host>:8081
- EUD TCP:
0.0.0.0:8088
- EUD TLS:
0.0.0.0:8089
- Clients:
- iTAK 2.12.3.651 EUDs (multiple users created via
/api/user/add).
Symptoms
After some number of EUD connect/disconnect cycles:
- iTAK clients that had been logging in successfully stop being able to log in via TLS on 8089.
- OTS HTTP remains healthy:
GET /api/login returns the login form.
POST /api/login with admin credentials succeeds.
GET /api/eud returns 200 and valid JSON.
- From the server:
ss -tulpn shows 0.0.0.0:8089 still listening.
- Before restart,
openssl s_client -connect 127.0.0.1:8089 often hangs or errors; after restarting opentakserver + eud_handler + cot_parser, it completes the TLS handshake and devices reconnect without configuration changes.
So the listener is bound, but the EUD TLS handler stops functioning correctly until processes are restarted.
Logs and errors
EUD handler cleanup exceptions
In the TLS EUD handler log (eud_handler --ssl) we see repeated exceptions like:
Exception in thread Thread-XXX:
Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File ".../opentakserver/eud_handler/client_controller.py", line 266, in run
self.close_connection()
File ".../opentakserver/eud_handler/client_controller.py", line 309, in close_connection
self.unbind_rabbitmq_queues()
File ".../opentakserver/eud_handler/client_controller.py", line 536, in unbind_rabbitmq_queues
self.rabbit_channel.queue_unbind(exchange=bind['exchange'], queue=bind['queue'], routing_key=bind['routing_key'])
~~~~^^^^^^^^^
KeyError: 'queue'
These occur during disconnect, when cleaning up RabbitMQ bindings.
EUD handler socket warnings
In opentakserver.log we see the usual disconnect noise:
[... WARNING - No Data Closing connection to <ip>]
[... INFO - <ip> disconnected]
[... ERROR - <callsign>: EOF occurred in violation of protocol (_ssl.c:2406), closing socket]
These often precede or accompany the KeyError: 'queue' exceptions. After enough of these, new TLS logins stop working until OTS is restarted.
Code analysis (OTS 1.6.0)
In opentakserver/eud_handler/client_controller.py, ClientController tracks RabbitMQ bindings in self.bound_queues and later unbinds them:
class ClientController(Thread):
def __init__(...):
...
self.bound_queues = []
...
Populating bound_queues
During EUD setup, once the user and EUD UID are known:
with self.app.app_context():
group_memberships = db.session.execute(
db.session.query(GroupUser).filter_by(user_id=self.user.id, direction=Group.OUT)
).all()
if not group_memberships or not self.is_ssl:
self.logger.debug(f"{self.callsign} doesn't belong to any groups, adding them to the __ANON__ group")
self.rabbit_channel.queue_bind(exchange="groups", queue=self.uid, routing_key="__ANON__.OUT")
if {"exchange": "groups", "routing_key": "__ANON__.OUT", "queue": self.uid} not in self.bound_queues:
# BUG: appended dict is missing 'queue'
self.bound_queues.append({"exchange": "groups", "routing_key": "__ANON__.OUT"})
elif group_memberships and self.is_ssl:
for membership in group_memberships:
membership = membership[0]
self.rabbit_channel.queue_bind(exchange="groups", queue=self.uid, routing_key=f"{membership.group.name}.OUT")
if {"exchange": "groups", "routing_key": f"{membership.group.name}.OUT", "queue": self.uid} not in self.bound_queues:
self.bound_queues.append({"exchange": "groups", "routing_key": f"{membership.group.name}.OUT", "queue": self.uid})
self.rabbit_channel.queue_bind(exchange='missions', routing_key="missions", queue=self.uid)
if {"exchange": "missions", "routing_key": "missions", "queue": self.uid} not in self.bound_queues:
# BUG: appends 'groups' entry instead of 'missions'
self.bound_queues.append({"exchange": "groups", "routing_key": "__ANON__.OUT", "queue": self.uid})
# DM bindings:
self.rabbit_channel.queue_bind(exchange='dms', queue=self.uid, routing_key=self.uid)
self.rabbit_channel.queue_bind(exchange='dms', queue=self.callsign, routing_key=self.callsign)
if {"exchange": "dms", "routing_key": self.uid, "queue": self.uid} not in self.bound_queues:
self.bound_queues.append({"exchange": "dms", "routing_key": self.uid, "queue": self.uid})
if {"exchange": "dms", "routing_key": self.callsign, "queue": self.callsign} not in self.bound_queues:
self.bound_queues.append({"exchange": "dms", "routing_key": self.callsign, "queue": self.callsign})
Cleanup
def unbind_rabbitmq_queues(self):
if self.uid and self.rabbit_channel and not self.rabbit_channel.is_closing and not self.rabbit_channel.is_closed:
self.rabbit_channel.queue_unbind(queue=self.uid, exchange="missions", routing_key="missions")
self.rabbit_channel.queue_unbind(queue=self.uid, exchange="groups")
with self.app.app_context():
missions = db.session.execute(db.session.query(Mission)).all()
for mission in missions:
self.rabbit_channel.queue_unbind(queue=self.uid, exchange="missions", routing_key=f"missions.{mission[0].name}")
self.logger.debug(f"Unbound {self.uid} from mission.{mission[0].name}")
for bind in self.bound_queues:
self.logger.info(f"Unbinding {bind}")
self.rabbit_channel.queue_unbind(
exchange=bind['exchange'],
queue=bind['queue'],
routing_key=bind['routing_key'],
)
Problems
- In the
if not group_memberships or not self.is_ssl branch, the dict appended to bound_queues is missing 'queue'. Later, unbind_rabbitmq_queues() blindly does bind['queue'], causing KeyError: 'queue'.
- In the missions binding block, the condition checks for a
"missions" entry, but the append uses "exchange": "groups" with "__ANON__.OUT", which appears to be a typo and makes bound_queues inconsistent.
Repeated connect/disconnect cycles for users that hit the not group_memberships or not self.is_ssl branch (e.g., EUD users with no OUT group memberships) will populate bound_queues with malformed entries and eventually crash multiple handler threads during cleanup. That matches the KeyError: 'queue' stack traces and the degradation we see in the TLS EUD handler.
Proposed fix
-
When appending to self.bound_queues, always include a 'queue' key:
self.bound_queues.append({"exchange": "groups", "routing_key": "__ANON__.OUT", "queue": self.uid})
instead of:
self.bound_queues.append({"exchange": "groups", "routing_key": "__ANON__.OUT"})
-
In the missions block, append a "missions" entry instead of a "groups" entry:
if {"exchange": "missions", "routing_key": "missions", "queue": self.uid} not in self.bound_queues:
self.bound_queues.append({"exchange": "missions", "routing_key": "missions", "queue": self.uid})
-
Optionally harden unbind_rabbitmq_queues() to skip any malformed entries:
for bind in self.bound_queues:
if 'queue' not in bind:
self.logger.warning(f"Skipping malformed bind entry: {bind}")
continue
self.rabbit_channel.queue_unbind(
exchange=bind['exchange'],
queue=bind['queue'],
routing_key=bind['routing_key'],
)
These changes should prevent KeyError: 'queue' during cleanup and stop the EUD TLS handler from degrading over time under normal EUD connect/disconnect patterns.
OTS 1.6 EUD TLS Handler / RabbitMQ Cleanup Bug
Environment
127.0.0.1:5672, default guest/guest,cot,dms, andgroupsexchanges.http://<vm>:8081.0.0.0.0:8088.0.0.0.0:8089.Symptom
After some time running under normal load, the EUD TLS handler stops accepting device logins reliably:
GET /api/loginreturns the login page.POST /api/loginwith admin credentials works (302 redirect)./api/eudreturns valid JSON and shows existing EUD entries (e.g.,magnum6,red1).ss -tulpn.openssl s_client -connect 127.0.0.1:8089often hangs or fails; after restart, it shows a valid TLS handshake and theopentakservercert.Logs and Errors Observed
1. EUD handler cleanup exceptions (KeyError: 'queue')
In
/tmp/ots16_eud_handler_ssl.logwe see repeated exceptions like:This indicates that during disconnect,
ClientController.unbind_rabbitmq_queues()is iteratingself.bound_queuesand encountering entries without a'queue'key.2. EUD handler socket warnings and OSErrors
In
ots15/logs/opentakserver.log.2025-12-04, we see:These appear around the times when TLS EUD connections (including headless clients) disconnect. They often precede or accompany the
KeyError: 'queue'exceptions above.3. TLS 8089 behavior before vs after restart
Before restarting the EUD handlers:
nc -vz 40.76.106.80 8089andopenssl s_client -connect 40.76.106.80:8089from a remote host frequently timed out.After restarting:
produces:
and the devices reconnect without any configuration change.
Code Analysis –
bound_queuesand Cleanup BugThe relevant code in
opentakserver/eud_handler/client_controller.py(OTS 1.6.0) looks like:And a key part of the setup path where
bound_queuesis populated:Problems we see:
not group_memberships or not self.is_sslbranch, the dict appended tobound_queuesis missing'queue'.bound_queues.append({"exchange": "groups", ...})is called inside a guard for{"exchange": "missions", ...}– likely a copy/paste error.unbind_rabbitmq_queues()assumes all entries inbound_queueshave a'queue'key and raisesKeyErrorwhen it hits the malformed entry.Trigger Pattern
We are not doing anything exotic; the pattern that triggers this is:
/api/user/add(e.g.,"S3 STAFF","NOMAD 4",magnum6,red1,agent).GroupUserrows withdirection=Group.OUTfor them).ClientController.close_connection()callsunbind_rabbitmq_queues(), which iteratesself.bound_queuesand hits the malformed entries, throwingKeyError: 'queue'.The issue is driven by normal TLS EUD connect/disconnect cycles for users with no OUT group memberships.
Proposed fix
self.bound_queuesinclude a'queue'key.{"exchange": "missions", ...}entry instead of agroupsentry.unbind_rabbitmq_queues()resilient to malformed entries (e.g., skip entries missing'queue'or log and continue), so a bad dict does not crash the handler.OTS 1.6 EUD TLS Handler / RabbitMQ Cleanup Bug
We’re seeing the EUD TLS handler in OTS 1.6.0 become unstable over time: TLS logins on 8089 for all iTAK devices suddenly start failing until OTS is restarted. The underlying cause appears to be a bug in
ClientController.unbind_rabbitmq_queueswhen cleaning up RabbitMQ bindings.Environment
127.0.0.1:5672, exchangescot,dms, andgroups.http://<host>:80810.0.0.0:80880.0.0.0:8089/api/user/add).Symptoms
After some number of EUD connect/disconnect cycles:
GET /api/loginreturns the login form.POST /api/loginwith admin credentials succeeds.GET /api/eudreturns 200 and valid JSON.ss -tulpnshows0.0.0.0:8089still listening.openssl s_client -connect 127.0.0.1:8089often hangs or errors; after restartingopentakserver+eud_handler+cot_parser, it completes the TLS handshake and devices reconnect without configuration changes.So the listener is bound, but the EUD TLS handler stops functioning correctly until processes are restarted.
Logs and errors
EUD handler cleanup exceptions
In the TLS EUD handler log (
eud_handler --ssl) we see repeated exceptions like:These occur during disconnect, when cleaning up RabbitMQ bindings.
EUD handler socket warnings
In
opentakserver.logwe see the usual disconnect noise:These often precede or accompany the
KeyError: 'queue'exceptions. After enough of these, new TLS logins stop working until OTS is restarted.Code analysis (OTS 1.6.0)
In
opentakserver/eud_handler/client_controller.py,ClientControllertracks RabbitMQ bindings inself.bound_queuesand later unbinds them:Populating
bound_queuesDuring EUD setup, once the user and EUD UID are known:
Cleanup
Problems
if not group_memberships or not self.is_sslbranch, the dict appended tobound_queuesis missing'queue'. Later,unbind_rabbitmq_queues()blindly doesbind['queue'], causingKeyError: 'queue'."missions"entry, but the append uses"exchange": "groups"with"__ANON__.OUT", which appears to be a typo and makesbound_queuesinconsistent.Repeated connect/disconnect cycles for users that hit the
not group_memberships or not self.is_sslbranch (e.g., EUD users with no OUT group memberships) will populatebound_queueswith malformed entries and eventually crash multiple handler threads during cleanup. That matches theKeyError: 'queue'stack traces and the degradation we see in the TLS EUD handler.Proposed fix
When appending to
self.bound_queues, always include a'queue'key:instead of:
In the missions block, append a
"missions"entry instead of a"groups"entry:Optionally harden
unbind_rabbitmq_queues()to skip any malformed entries:These changes should prevent
KeyError: 'queue'during cleanup and stop the EUD TLS handler from degrading over time under normal EUD connect/disconnect patterns.