Skip to content

Fix race condition in ChangelogBackendTestCase.searchInChangeNumberModeOnOneSuffixMultipleTimes#604

Open
Copilot wants to merge 7 commits intomasterfrom
copilot/fix-race-condition-in-changelog-test
Open

Fix race condition in ChangelogBackendTestCase.searchInChangeNumberModeOnOneSuffixMultipleTimes#604
Copilot wants to merge 7 commits intomasterfrom
copilot/fix-race-condition-in-changelog-test

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 30, 2026

Change number assignment is asynchronous server-side, so searching immediately after publishing replication messages could race ahead of the mapper — causing intermittent assertion failures when the expected 4 entries weren't yet visible within the 10-second retry window.

Changes

  • ChangelogBackendTestCase.java — added two assertChangelogAttributesInRootDSE synchronization barriers in searchInChangeNumberModeOnOneSuffixMultipleTimes:
    • assertChangelogAttributesInRootDSE(1, 4) before searching the first batch (CNs 1–4)
    • assertChangelogAttributesInRootDSE(1, 8) before searching the second batch (CNs 5–8)
CSN[] csns = generateAndPublishUpdateMsgForEachOperationType(testName, false);
// Wait until changenumber 4 is visible before searching
assertChangelogAttributesInRootDSE(1, 4);
searchChangesForEachOperationTypeUsingChangeNumberMode(1, csns, testName);

csns = generateAndPublishUpdateMsgForEachOperationType(testName, false);
// Wait until changenumber 8 is visible before searching
assertChangelogAttributesInRootDSE(1, 8);
searchChangesForEachOperationTypeUsingChangeNumberMode(5, csns, testName);

assertChangelogAttributesInRootDSE already polls lastchangenumber in the root DSE via a TestTimer retry loop, making it the correct barrier rather than widening the search timeout.

Original prompt

Problem

The CI job for PR #599 is failing with:

[ERROR] Failures:
[ERROR]   ChangelogBackendTestCase.searchInChangeNumberModeOnOneSuffixMultipleTimes:588->searchChangesForEachOperationTypeUsingChangeNumberMode:1044->searchChangelog:1108->searchChangelog:1124
The following assertion failed: []
at ChangelogBackendTestCase$6.call(ChangelogBackendTestCase$6.java:1134)
[ERROR] Tests run: 24943, Failures: 1, Errors: 0, Skipped: 185

Root Cause

The test searchInChangeNumberModeOnOneSuffixMultipleTimes in
opendj-server-legacy/src/test/java/org/opends/server/backends/ChangelogBackendTestCase.java
has a race condition. It publishes two batches of 4 replication messages and immediately searches
for them, but there is no synchronization barrier between publishing the messages and searching.
The change number assignment is done server-side asynchronously, so by the time the 10-second
searchChangelog retry loop runs, the second batch of change numbers (5–8) may not yet be
visible, causing the assertion softly.assertThat(searchOp.getSearchEntries()).hasSize(4) to fail.

Fix (Option 2 — proper synchronization)

Add assertChangelogAttributesInRootDSE calls before each search to wait for the change
number mapper to finish assigning numbers. The assertChangelogAttributesInRootDSE method already
uses a TestTimer retry loop and checks lastchangenumber in the root DSE, making it the correct
synchronization point.

File to modify

opendj-server-legacy/src/test/java/org/opends/server/backends/ChangelogBackendTestCase.java

Modify the searchInChangeNumberModeOnOneSuffixMultipleTimes method (lines 580–611) as follows:

Before:

@Test
public void searchInChangeNumberModeOnOneSuffixMultipleTimes() throws Exception
{
  replicationServer.getChangelogDB().setPurgeDelay(0);

  // write 4 changes starting from changenumber 1, and search them
  String testName = "Multiple/1";
  CSN[] csns = generateAndPublishUpdateMsgForEachOperationType(testName, false);
  searchChangesForEachOperationTypeUsingChangeNumberMode(1, csns, testName);

  // write 4 more changes starting from changenumber 5, and search them
  testName = "Multiple/5";
  csns = generateAndPublishUpdateMsgForEachOperationType(testName, false);
  searchChangesForEachOperationTypeUsingChangeNumberMode(5, csns, testName);

  // search from the provided change number: 6 (should be the add msg)
  CSN csnOfLastAddMsg = csns[1];
  searchChangelogForOneChangeNumber(6, csnOfLastAddMsg);

  // search from a provided change number interval: 5-7
  searchChangelogFromToChangeNumber(5,7);

  assertChangelogAttributesInRootDSE(1, 8);

  // add a new change, then check again first and last change number without previous search
  testName = "Multiple/9";
  CSN lastCsn = csns[csns.length - 1];
  CSN csn = new CSN(lastCsn.getTime() + 1, 9, server1.getServerId());
  publishUpdateMessagesInOTest(testName, false, generateDeleteMsg(server1, csn, testName, 1));

  assertChangelogAttributesInRootDSE(1, 9);
}

After:

@Test
public void searchInChangeNumberModeOnOneSuffixMultipleTimes() throws Exception
{
  replicationServer.getChangelogDB().setPurgeDelay(0);

  // write 4 changes starting from changenumber 1, and search them
  String testName = "Multiple/1";
  CSN[] csns = generateAndPublishUpdateMsgForEachOperationType(testName, false);
  // Wait until changenumber 4 is visible before searching
  assertChangelogAttributesInRootDSE(1, 4);
  searchChangesForEachOperationTypeUsingChangeNumberMode(1, csns, testName);

  // write 4 more changes starting from changenumber 5, and search them
  testName = "Multiple/5";
  csns = generateAndPublishUpdateMsgForEachOperationType(testName, false);
  // Wait until changenumber 8 is visible before searching
  assertChangelogAttributesInRootDSE(1, 8);
  searchChangesForEachOperationTypeUsingChangeNumberMode(5, csns, testName);

  // search from the provided change number: 6 (should be the add msg)
  CSN csnOfLastAddMsg = csns[1];
  searchChangelogForOneChangeNumber(6, csnOfLastAddMsg);

  // search from a provided change number interval: 5-7
  searchChangelogFromToChangeNumber(5,7);

  assertChangelogAttributesInRootDSE(1, 8);

  // add a new change, then check again first and last change number without previous search
  testName = "Multiple/9";
  CSN lastCsn = csns[csns.length - 1];
  CSN csn = new CSN(lastCsn.getTime() + 1, 9, server1.getServerId());
  publishUpdateMessagesInOTest(testName, false, generateDeleteMsg(server1, csn, testName, 1));

  assertChangelogAttributesInRootDSE(1, 9);
}

Summary of changes

  • Before the first call to searchChangesForEachOperationTypeUsingChangeNumberMode(1, ...), add assertChangelogAttributesInRootDSE(1, 4) to wait for change numbers 1–4 to be assigned.
  • Before the second call to searchChangesForEachOperationTypeUsingChangeNumberMode(5, ...), add assertChangelogAttributesInRootDSE(1, 8) to wait for change numbers 5–8 to be...

This pull request was created from Copilot chat.

@vharseko vharseko changed the title [WIP] Fix race condition in ChangelogBackendTestCase Fix race condition in ChangelogBackendTestCase Mar 30, 2026
Copilot AI changed the title Fix race condition in ChangelogBackendTestCase Fix race condition in ChangelogBackendTestCase.searchInChangeNumberModeOnOneSuffixMultipleTimes Mar 30, 2026
Copilot AI requested a review from vharseko March 30, 2026 13:40
@vharseko vharseko changed the base branch from copilot/fix-inverted-logic-substring-check to master March 30, 2026 16:59
@vharseko vharseko marked this pull request as ready for review March 30, 2026 17:00
@vharseko vharseko requested a review from maximthomas March 30, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants