[flags] fix: use write lock for addListener by typotter · Pull Request #3132 · DataDog/dd-sdk-android

typotter · 2026-01-19T17:33:51Z

What does this PR do?

Uses a write lock instead of read when adding a listener

Motivation

Adding a listener effectively changes the contested state we want to keep atomic {currentState, listeners} so it must block other write calls. The race condition could otherwise result in a listener missing a state change

Additional Notes

Anything else we should know when reviewing?

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Make sure you discussed the feature or bugfix with the maintaining team in an Issue
Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

typotter · 2026-01-19T17:34:32Z

...dk-android-flags/src/test/kotlin/com/datadog/android/flags/internal/FlagsStateManagerTest.kt

        val listener1 = object : FlagsStateListener {
            override fun onStateChanged(newState: FlagsClientState) {
-                if (newState is FlagsClientState.Ready) {
+                if (newState == FlagsClientState.Ready) {


FlagClientState is a sealed class

typotter · 2026-01-19T17:34:52Z

...dk-android-flags/src/test/kotlin/com/datadog/android/flags/internal/FlagsStateManagerTest.kt


    @Test
-    fun `M stop notifying subsequent listeners W updateState() { if listener throws }`() {
+    fun `M stop notifying subsequent listeners W updateState() { listener throws }`() {


polishing some names/comments while here

datadog-official · 2026-01-19T17:55:07Z

🎯 Code Coverage
• Patch Coverage: 0.00%
• Overall Coverage: 65.80% (+0.03%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 0558c4c | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

codecov-commenter · 2026-01-19T18:08:46Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 70.73%. Comparing base (6c11581) to head (0558c4c).

Files with missing lines	Patch %	Lines
...atadog/android/flags/internal/FlagsStateManager.kt	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3132   +/-   ##
========================================
  Coverage    70.73%   70.73%           
========================================
  Files          893      893           
  Lines        33000    33000           
  Branches      5549     5550    +1     
========================================
+ Hits         23341    23342    +1     
+ Misses        8102     8097    -5     
- Partials      1557     1561    +4

Files with missing lines	Coverage Δ
...atadog/android/flags/internal/FlagsStateManager.kt	`84.62% <0.00%> (-7.69%)`	⬇️

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

0xnm · 2026-01-20T08:15:01Z

...dk-android-flags/src/test/kotlin/com/datadog/android/flags/internal/FlagsStateManagerTest.kt

+    @Test
+    fun `M block getCurrentState calls W addListener() { slow listener notification }`() {
+        // Given
+        val stateOld = FlagsClientState.NotReady


nit: probably should be oldState to be consistent with naming convention used in newState.

Or maybe for the clarity we can just inline FlagsClientState.NotReady where needed, because stateOld is val anyway and cannot be changed.

0xnm · 2026-01-20T08:18:05Z

...dk-android-flags/src/test/kotlin/com/datadog/android/flags/internal/FlagsStateManagerTest.kt

+        addListenerThread.start()
+        addListenerStarted.await()
+        getCurrentStateThread.start()
+        getCurrentStateAttempted.await()


getCurrentStateAttempted is not actually needed I think, the necessary wait for the getCurrentStateThread completion is already implemented by the getCurrentStateThread.join() call below.

0xnm · 2026-01-20T08:19:27Z

...dk-android-flags/src/test/kotlin/com/datadog/android/flags/internal/FlagsStateManagerTest.kt

+        val addListenerSlowCallbackStarted = CountDownLatch(1)
+        val getCurrentStateAttempted = CountDownLatch(1)
+
+        val operationTimestamps = mutableListOf<Pair<String, Long>>()


synchronized calls can be removed if type here is CopyOnWriteArrayList

0xnm · 2026-01-20T08:37:49Z

...dk-android-flags/src/test/kotlin/com/datadog/android/flags/internal/FlagsStateManagerTest.kt

    }

+    @Test
+    fun `M block getCurrentState calls W addListener() { slow listener notification }`() {


Why do we want to block getCurrentState when addListener is called, if addListener is not changing the state?

🤔 We may have had the right lock here in the first place...

If addListener holds the read lock, then, ostensibly, nothing can write to the state (current state or the listeners). While addListener is technically mutating the protected state by adding a listener, the underlying mechanism itself is theadsafe/synchronized via CopyOnWriteArrayList so it's fine if parallel calls to addListener attempt to mutate the set of listeners. The order of listener subscription is not important to maintain here.

We should just be able to rely on the underlying thread-safety of CopyOnWriteArrayList to avoid clobbering of the listener list, right?

Maybe we shouldn't wrap adding listener with lock then?

We need to block writing to the current state until the first notification and addListener are complete. We don't get that atomicity with DDCoreSubscription, only that the calls underlying calls to add the listener to the list are thread safe and synchronized so no listener will be lost when parallel calls are made.

In effect, I believe we can drop this change altogether and rely on the read lock in conjunction with the underlying lock in CopyOnWriteArrayList

I think yes, we should. My thinking is the following:

getCurrentState covers only FlagsStateManager.currentState. Listeners is not a part of FlagsStateManager.currentState, so they are not related to the getCurrentState call.

FlagsStateManager.addListener call doesn't modify FlagsStateManager.currentState, it only reads. So that us why we added a read lock. If something is in the process of modifying FlagsStateManager.currentState, then there will be a wait to acquire read lock.

FlagsStateManager.addListener mutates underlying listeners collection, but anyway this is atomic and thread-safe, currentState value passed down is guarded by the read lock and if there is iteration already happening in updateState -> subscription.notifyListeners, then this new listener won't be a part of ongoing iteration (due to the usage of CopyOnWriteArrayList in DDSubscription).

So I'm curious why should we block getCurrentState call during FlagsStateManager.addListener invocation.

The race condition could otherwise result in a listener missing a state change

Is it about FlagsStateManager.currentState property here? From the code I see, listener shouldn't miss any state change, since read lock cannot be acquired if write lock is active, and the opposite.

Well, having read it more carefully, I agree with @nikita. The only thing that still makes me curious is the fact that we adding listener with a lock, but removing without. Yes, underlying CopyOnWriteArrayList will protect us from the ConcurrentModificationException, however if updateState and removeListener will be executed at the same time, removed listener could get an undesired update (from subscription.notifyListeners ) and could lead to undefined behavior. Not sure if this is a critical issue, but I'd prefer to maintain consistency here.

Thanks @satween. Agreed that there's an apparent lack of synchronicity on removing a listener.

We are not just "Adding a listener", however.
We need to

get the current state

notify the listener

add the listener to the list
All without the currentState changing, so we block changes with a read lock.

When we remove the listener, have only that operation to complete so there isn't a risk of missing data, only, as you noted, receiving extra data. We would need to add an extra synchronization shared between the removeListener method and the updateState method. I think you're correct here though - a long-running listener callback could cause a removed listener to be notified after the call to removeListener completes

The scale of adding/removing listeners will be very low (0, or 1 or maybe a few) so there is a very small risk here of a superfluous event.

I'll come back to this with an updated lock after a couple of other tasks

fix: use write lock for addListener

0558c4c

typotter commented Jan 19, 2026

View reviewed changes

typotter marked this pull request as ready for review January 19, 2026 17:35

typotter requested a review from a team as a code owner January 19, 2026 17:35

typotter changed the title ~~fix: use write lock for addListener~~ [flags] fix: use write lock for addListener Jan 19, 2026

aleksandr-gringauz approved these changes Jan 20, 2026

View reviewed changes

0xnm reviewed Jan 20, 2026

View reviewed changes

Conversation

typotter commented Jan 19, 2026

What does this PR do?

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datadog-official bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 19, 2026

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

datadog-official bot commented Jan 19, 2026 •

edited

Loading