-
Notifications
You must be signed in to change notification settings - Fork 2
Description
On 2nd January we investigated an issue related to InfluxDB query usage (see internal post-mortem here). The root cause of this turned out to be that the handle_api_usage_notifications recurring task had been executing almost constantly, causing our usage to increase significantly.
This issue covers one piece of the follow up from that issue, related to an issue in our Task Processor which can cause (recurring only?) tasks to get stuck in a constant execution loop.
As I understand it, the behaviour was as follows:
- Task was executed at the normal scheduled time
- Task executed successfully, but failed to unlock itself (see Sentry issue here)
- Since it failed to unlock itself, it also never wrote the
RecurringTaskRunobject, and hence wasn't marked as 'completed' - After the timeout expired (which I assume is the default of 30mins since it isn't defined otherwise here), the task runner automatically unlocked the task and tried to run it again
- Cycle repeat
Given the timing of this issue, and the nature, our assumption is that this is related to the changes that were made to various timeout settings in our RDS configuration (context here, and copied below for posterity). These settings were updated on 12th December, based on an unrelated performance issues relating to segment updates which aligns with the Sentry error pattern.
statement_timeout → 60000ms (1 min)
idle_in_transaction_session_timeout → 300000ms (5 min)
lock_timeout → 30000ms (30 sec)