Skip to content

fix: improve reconnection stability and fix resource leaks#384

Open
Yana-Hangabina wants to merge 1 commit into
overleaf-workshop:masterfrom
Yana-Hangabina:fix-reconnect-stability
Open

fix: improve reconnection stability and fix resource leaks#384
Yana-Hangabina wants to merge 1 commit into
overleaf-workshop:masterfrom
Yana-Hangabina:fix-reconnect-stability

Conversation

@Yana-Hangabina

@Yana-Hangabina Yana-Hangabina commented Jul 5, 2026

Copy link
Copy Markdown

Summary

Fixes several bugs in the socket reconnection flow that cause crashes, lost file sync operations, and resource leaks. These issues affect all users but are more visible on unstable connections (closes #309).

Problems

  1. _resolveById throws when VFS root is uninitialized — during reconnect, this.root is briefly undefined. Any call to _resolveById (from ClientManager.removePosition, cursor tracking, etc.) throws EntryNotFound, crashing the 500ms status timer chain. All callers already handle undefined returns via optional chaining.

  2. SCMCollectionProvider destroyed on every reconnectinitializingPromise unconditionally disposes and recreates both ClientManager and SCMCollectionProvider after each successful joinProject. This kills in-progress LocalReplicaSCMProvider.overwrite() (file sync), discards file watchers, and resets collaboration state.

  3. connectionRejected listener leak in joinProject — uses .on() instead of .once(), accumulating a new listener on every joinProject call. After ~10 reconnects, triggers MaxListenersExceededWarning.

  4. EventBus.on('socketioConnectedEvent') leakupdateEventHandlers registers a new EventBus listener for onConnectionAccepted on every call without removing the previous one.

  5. 5-second socket emit timeout — too aggressive for connections with non-trivial latency (proxy, remote servers). joinDoc calls frequently time out, cascading into disconnects.

  6. error handler throws uncaught exceptionsinitInternalHandlers throws on socket errors, which can crash the extension host.

Changes

File Change
src/core/remoteFileSystemProvider.ts _resolveById returns undefined when root is not set; skip SCM/ClientManager recreation when they already exist
src/collaboration/clientManager.ts Add reconnect() method to re-register handlers on new socket; save handlers as named object; guard removePosition in updateStatus forEach
src/api/socketio.ts Increase emit/joinProject timeout to 15s; use .once() for connectionRejected; dispose previous EventBus listener before re-registering; log errors instead of throwing

Test plan

  • Open a project, verify file sync completes without "Local Replica creation failed"
  • Disconnect and reconnect network, verify status bar recovers and collaboration state is preserved
  • Check developer console for absence of MaxListenersExceededWarning and EntryNotFound errors
  • Verify real-time collaboration (cursor tracking, user list) works after reconnect

- Return undefined instead of throwing in _resolveById when VFS root
  is not initialized, preventing EntryNotFound crashes during reconnect
- Reuse existing ClientManager and SCMCollectionProvider on reconnect
  instead of disposing and recreating them, which kills in-progress
  file sync operations and loses collaboration state
- Add ClientManager.reconnect() to re-register socket event handlers
  on the new connection without full teardown
- Increase socket emit and joinProject timeout from 5s to 15s to
  accommodate higher latency connections
- Use .once() instead of .on() for connectionRejected in joinProject
  to prevent listener accumulation across reconnects
- Dispose previous EventBus listener before registering new one in
  onConnectionAccepted handler to prevent memory leak
- Log socket errors instead of throwing to avoid crashing the
  extension host
- Guard removePosition calls in updateStatus forEach to prevent
  timer chain interruption
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

connection lost

1 participant