Skip to content

Recurring tasks can get stuck in a constant loop of execution #152

@matthewelwell

Description

@matthewelwell

On 2nd January we investigated an issue related to InfluxDB query usage (see internal post-mortem here). The root cause of this turned out to be that the handle_api_usage_notifications recurring task had been executing almost constantly, causing our usage to increase significantly.

This issue covers one piece of the follow up from that issue, related to an issue in our Task Processor which can cause (recurring only?) tasks to get stuck in a constant execution loop.

As I understand it, the behaviour was as follows:

  1. Task was executed at the normal scheduled time
  2. Task executed successfully, but failed to unlock itself (see Sentry issue here)
  3. Since it failed to unlock itself, it also never wrote the RecurringTaskRun object, and hence wasn't marked as 'completed'
  4. After the timeout expired (which I assume is the default of 30mins since it isn't defined otherwise here), the task runner automatically unlocked the task and tried to run it again
  5. Cycle repeat

Given the timing of this issue, and the nature, our assumption is that this is related to the changes that were made to various timeout settings in our RDS configuration (context here, and copied below for posterity). These settings were updated on 12th December, based on an unrelated performance issues relating to segment updates which aligns with the Sentry error pattern.

statement_timeout → 60000ms (1 min)
idle_in_transaction_session_timeout → 300000ms (5 min)
lock_timeout → 30000ms (30 sec)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions