Skip to content

Flaky webhook subscriber timeout test globally counts asyncio.sleep calls #3791

@enyst

Description

@enyst

Summary

tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling is flaky in the full agent-server suite because it globally patches asyncio.sleep and then asserts the raw global mock call count.

Evidence

Main failed on push in Run tests:

Failure excerpt:

FAILED tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling - AssertionError: assert 1594 == 2
 +  where 1594 = <AsyncMock name='sleep' id='140409751330096'>.call_count
============ 1 failed, 1248 passed, 39 warnings in 70.82s (0:01:10) ============

The failing test patches the process-global asyncio.sleep:

with patch("asyncio.sleep") as mock_sleep:
    await subscriber._post_events()

assert mock_client.request.call_count == 3
assert mock_sleep.call_count == 2

But other webhook subscriber tests create background flush timer tasks (WebhookSubscriber._flush_after_delay) that also call asyncio.sleep. When those tasks are still alive during this test, their sleeps are counted by the same global mock.

I also verified that the test passes in isolation on current origin/main, which supports this being suite-order/background-task pollution rather than deterministic behavior in the test itself:

git checkout origin/main --quiet
uv run pytest tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling -q

Result:

1 passed, 5 warnings in 0.23s

Nearby tests in the same file already acknowledge this pattern and avoid asserting raw global sleep count by collecting sleep delays and filtering to webhook_spec.retry_delay.

Proposed fixes

A few possible fixes, from smallest to more structural:

  1. Smallest test-only fix: change test_timeout_error_handling (and likely test_network_error_handling) to use a side-effect that records delays, then assert only retry-delay sleeps:

    sleep_calls = []
    
    async def mock_sleep(delay):
        sleep_calls.append(delay)
    
    with patch("asyncio.sleep", side_effect=mock_sleep):
        await subscriber._post_events()
    
    retry_sleeps = [d for d in sleep_calls if d == webhook_spec.retry_delay]
    assert len(retry_sleeps) == 2
  2. Clean up background timers in tests: make tests that call await subscriber(event) and leave the queue below event_buffer_size also call await subscriber.close() or explicitly cancel/await subscriber._flush_timer in teardown. This prevents leaked _flush_after_delay() tasks from reaching later tests.

  3. Better production/test seam: inject a sleep callable into WebhookSubscriber (defaulting to asyncio.sleep) so retry tests can patch only the subscriber instance instead of the module-global asyncio.sleep.

Notes

This surfaced on a main push after an unrelated agent-profile naming PR because that push ran the full tests/agent_server suite under xdist. The failed PR did not touch webhook subscriber logic.

This issue was created by an AI agent (OpenHands) on behalf of the user.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions