Summary
tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling is flaky in the full agent-server suite because it globally patches asyncio.sleep and then asserts the raw global mock call count.
Evidence
Main failed on push in Run tests:
Failure excerpt:
FAILED tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling - AssertionError: assert 1594 == 2
+ where 1594 = <AsyncMock name='sleep' id='140409751330096'>.call_count
============ 1 failed, 1248 passed, 39 warnings in 70.82s (0:01:10) ============
The failing test patches the process-global asyncio.sleep:
with patch("asyncio.sleep") as mock_sleep:
await subscriber._post_events()
assert mock_client.request.call_count == 3
assert mock_sleep.call_count == 2
But other webhook subscriber tests create background flush timer tasks (WebhookSubscriber._flush_after_delay) that also call asyncio.sleep. When those tasks are still alive during this test, their sleeps are counted by the same global mock.
I also verified that the test passes in isolation on current origin/main, which supports this being suite-order/background-task pollution rather than deterministic behavior in the test itself:
git checkout origin/main --quiet
uv run pytest tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling -q
Result:
1 passed, 5 warnings in 0.23s
Nearby tests in the same file already acknowledge this pattern and avoid asserting raw global sleep count by collecting sleep delays and filtering to webhook_spec.retry_delay.
Proposed fixes
A few possible fixes, from smallest to more structural:
-
Smallest test-only fix: change test_timeout_error_handling (and likely test_network_error_handling) to use a side-effect that records delays, then assert only retry-delay sleeps:
sleep_calls = []
async def mock_sleep(delay):
sleep_calls.append(delay)
with patch("asyncio.sleep", side_effect=mock_sleep):
await subscriber._post_events()
retry_sleeps = [d for d in sleep_calls if d == webhook_spec.retry_delay]
assert len(retry_sleeps) == 2
-
Clean up background timers in tests: make tests that call await subscriber(event) and leave the queue below event_buffer_size also call await subscriber.close() or explicitly cancel/await subscriber._flush_timer in teardown. This prevents leaked _flush_after_delay() tasks from reaching later tests.
-
Better production/test seam: inject a sleep callable into WebhookSubscriber (defaulting to asyncio.sleep) so retry tests can patch only the subscriber instance instead of the module-global asyncio.sleep.
Notes
This surfaced on a main push after an unrelated agent-profile naming PR because that push ran the full tests/agent_server suite under xdist. The failed PR did not touch webhook subscriber logic.
This issue was created by an AI agent (OpenHands) on behalf of the user.
Summary
tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handlingis flaky in the full agent-server suite because it globally patchesasyncio.sleepand then asserts the raw global mock call count.Evidence
Main failed on push in
Run tests:d7392de550b95eac6552e770531f41cd7b3df7acFailure excerpt:
The failing test patches the process-global
asyncio.sleep:But other webhook subscriber tests create background flush timer tasks (
WebhookSubscriber._flush_after_delay) that also callasyncio.sleep. When those tasks are still alive during this test, their sleeps are counted by the same global mock.I also verified that the test passes in isolation on current
origin/main, which supports this being suite-order/background-task pollution rather than deterministic behavior in the test itself:Result:
Nearby tests in the same file already acknowledge this pattern and avoid asserting raw global sleep count by collecting sleep delays and filtering to
webhook_spec.retry_delay.Proposed fixes
A few possible fixes, from smallest to more structural:
Smallest test-only fix: change
test_timeout_error_handling(and likelytest_network_error_handling) to use a side-effect that records delays, then assert only retry-delay sleeps:Clean up background timers in tests: make tests that call
await subscriber(event)and leave the queue belowevent_buffer_sizealso callawait subscriber.close()or explicitly cancel/awaitsubscriber._flush_timerin teardown. This prevents leaked_flush_after_delay()tasks from reaching later tests.Better production/test seam: inject a sleep callable into
WebhookSubscriber(defaulting toasyncio.sleep) so retry tests can patch only the subscriber instance instead of the module-globalasyncio.sleep.Notes
This surfaced on a main push after an unrelated agent-profile naming PR because that push ran the full
tests/agent_serversuite under xdist. The failed PR did not touch webhook subscriber logic.This issue was created by an AI agent (OpenHands) on behalf of the user.