Flaky webhook subscriber timeout test globally counts asyncio.sleep calls

## Summary

`tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling` is flaky in the full agent-server suite because it globally patches `asyncio.sleep` and then asserts the raw global mock call count.

## Evidence

Main failed on push in `Run tests`:

- Run: https://github.com/OpenHands/software-agent-sdk/actions/runs/27763693301
- Failed job: https://github.com/OpenHands/software-agent-sdk/actions/runs/27763693301/job/82144523424
- Commit: `d7392de550b95eac6552e770531f41cd7b3df7ac`

Failure excerpt:

```text
FAILED tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling - AssertionError: assert 1594 == 2
 +  where 1594 = <AsyncMock name='sleep' id='140409751330096'>.call_count
============ 1 failed, 1248 passed, 39 warnings in 70.82s (0:01:10) ============
```

The failing test patches the process-global `asyncio.sleep`:

```python
with patch("asyncio.sleep") as mock_sleep:
    await subscriber._post_events()

assert mock_client.request.call_count == 3
assert mock_sleep.call_count == 2
```

But other webhook subscriber tests create background flush timer tasks (`WebhookSubscriber._flush_after_delay`) that also call `asyncio.sleep`. When those tasks are still alive during this test, their sleeps are counted by the same global mock.

I also verified that the test passes in isolation on current `origin/main`, which supports this being suite-order/background-task pollution rather than deterministic behavior in the test itself:

```bash
git checkout origin/main --quiet
uv run pytest tests/agent_server/test_webhook_subscriber.py::TestWebhookSubscriberErrorHandling::test_timeout_error_handling -q
```

Result:

```text
1 passed, 5 warnings in 0.23s
```

Nearby tests in the same file already acknowledge this pattern and avoid asserting raw global sleep count by collecting sleep delays and filtering to `webhook_spec.retry_delay`.

## Proposed fixes

A few possible fixes, from smallest to more structural:

1. **Smallest test-only fix:** change `test_timeout_error_handling` (and likely `test_network_error_handling`) to use a side-effect that records delays, then assert only retry-delay sleeps:

   ```python
   sleep_calls = []

   async def mock_sleep(delay):
       sleep_calls.append(delay)

   with patch("asyncio.sleep", side_effect=mock_sleep):
       await subscriber._post_events()

   retry_sleeps = [d for d in sleep_calls if d == webhook_spec.retry_delay]
   assert len(retry_sleeps) == 2
   ```

2. **Clean up background timers in tests:** make tests that call `await subscriber(event)` and leave the queue below `event_buffer_size` also call `await subscriber.close()` or explicitly cancel/await `subscriber._flush_timer` in teardown. This prevents leaked `_flush_after_delay()` tasks from reaching later tests.

3. **Better production/test seam:** inject a sleep callable into `WebhookSubscriber` (defaulting to `asyncio.sleep`) so retry tests can patch only the subscriber instance instead of the module-global `asyncio.sleep`.

## Notes

This surfaced on a main push after an unrelated agent-profile naming PR because that push ran the full `tests/agent_server` suite under xdist. The failed PR did not touch webhook subscriber logic.

_This issue was created by an AI agent (OpenHands) on behalf of the user._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky webhook subscriber timeout test globally counts asyncio.sleep calls #3791

Summary

Evidence

Proposed fixes

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flaky webhook subscriber timeout test globally counts asyncio.sleep calls #3791

Description

Summary

Evidence

Proposed fixes

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions