Skip to content

[FFL-1720] Evaluation Logging: Aggregation Engine & Test Utilities#3145

Open
typotter wants to merge 39 commits intofeature/flags-evaluations-loggingfrom
typo/FFL-1720-pr2-aggregation
Open

[FFL-1720] Evaluation Logging: Aggregation Engine & Test Utilities#3145
typotter wants to merge 39 commits intofeature/flags-evaluations-loggingfrom
typo/FFL-1720-pr2-aggregation

Conversation

@typotter
Copy link
Contributor

@typotter typotter commented Jan 22, 2026

🥞 Evaluation Logging Stacked Pull Requests 🥞

🔲 Integration & Configuration (PR #3147)
🔲 Storage & Network Infrastructure (PR #3146)
👉 Aggregation Engine & Test Utilities (this PR)
☑️ FlagEvaluation Schema (PR #3166)
☑️ Event Schema & Data Models (PR #3144)
☑️ Evaluations Subfeature (PR #3159)
feature/flags-evaluations-logging (feature branch)

Datadog Internal
🎟️ Ticket: FFL-1720 - Implement Evaluation Logging for Android SDK

What does this PR do?

Implements the core evaluations aggregation engine that groups feature flag evaluations by key characteristics and manages periodic flushing from memory to persistent storage (via StorageBackedFeature/EvaluationsFeature).

The EvaluationEventProcessor manages concurrent access and mutation of AggregationStats instances in memory. Each Stats object is given an AggregationKey - a composite key made of the set of identifiers across the evaluation (targetingKey, variationKey, etc.) and SDK context (rum View ID). This ensures proper evaluation counting under the "same" circumstances.

The processor also manages the periodic and size-based triggers for flushing the in-memory aggregations to persistent storage.

Motivation

We need to implement Evaluation Logging to provide comprehensive visibility into all feature flag evaluations, including defaults, errors, and successful matches. This goes beyond exposure logging by capturing aggregated metrics about evaluation frequency, error rates, and runtime default usage across all flags.

Description

This PR adds the core business logic for evaluation logging:

Implementation:

  • AggregationKey: Defines how evaluations are grouped (by flag, variant, allocation, targeting key, and error code)
  • AggregationStats: Tracks count, first/last timestamps, and last error message for each aggregation
  • EvaluationEventsProcessor: Orchestrates aggregation, manages flush triggers (time-based, size-based, shutdown)
  • EvaluationEventWriter: Interface for persisting evaluation events (implementation in PR [FFL-1720] Evaluation Logging: Storage & Network Infrastructure #3146)

Additional Notes

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from e1704c3 to 95f4c0a Compare January 22, 2026 09:27
Copy link
Contributor Author

typotter commented Jan 22, 2026

@datadog-official

This comment has been minimized.

Comment on lines 77 to 123
// Take atomic snapshot of statistics to prevent torn reads
val snapshotCount: Int
val snapshotFirst: Long
val snapshotLast: Long
val snapshotMessage: String?
synchronized(this) {
snapshotCount = count
snapshotFirst = firstEvaluation
snapshotLast = lastEvaluation
snapshotMessage = lastErrorMessage
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this piece is confusing. These fields are volatile, so torn read is not possible. Otherwise, this class is not designed to handle concurrent calls to recordEvaluation and toEvaluationEvent as that may leave some events uncounted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this is being overly defensive. Because of the CAS of the map of aggregation stats for flushing, _no other thread should be calling toEvaluationEvent or recordEvaluation on this AggregationStats instance.

Without this synchronized, recordEvaluation could change the computed values while we're getting them for constructing the FlagEvaluation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the synchronized CAS snapshotting the aggregationMap for flushing, this method should be protected from another thread calling recordEvaluation which could change the values as they're being grabbed for the FlagEvaluation constructor
Is this too defensive?

@typotter typotter mentioned this pull request Jan 27, 2026
3 tasks
@typotter typotter force-pushed the typo/FFL-1720-pr1-schema-models branch 2 times, most recently from eb13adb to c924067 Compare January 27, 2026 19:25
@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from 97e4a4e to eb0ace5 Compare January 27, 2026 19:29
@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 97.18310% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.99%. Comparing base (986afe7) to head (2c13aff).

Files with missing lines Patch % Lines
...flags/internal/aggregation/EvaluationAggregator.kt 93.55% 0 Missing and 2 partials ⚠️
...ndroid/flags/internal/EvaluationEventsProcessor.kt 98.25% 1 Missing ⚠️
...oid/flags/internal/aggregation/AggregationStats.kt 97.92% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                          Coverage Diff                          @@
##           feature/flags-evaluations-logging    #3145      +/-   ##
=====================================================================
+ Coverage                              70.81%   70.99%   +0.18%     
=====================================================================
  Files                                    901      905       +4     
  Lines                                  33172    33314     +142     
  Branches                                5596     5618      +22     
=====================================================================
+ Hits                                   23489    23648     +159     
+ Misses                                  8118     8106      -12     
+ Partials                                1565     1560       -5     
Files with missing lines Coverage Δ
...droid/flags/internal/aggregation/AggregationKey.kt 100.00% <100.00%> (ø)
...ndroid/flags/internal/EvaluationEventsProcessor.kt 98.25% <98.25%> (ø)
...oid/flags/internal/aggregation/AggregationStats.kt 97.92% <97.92%> (ø)
...flags/internal/aggregation/EvaluationAggregator.kt 93.55% <93.55%> (ø)

... and 35 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@typotter typotter force-pushed the typo/FFL-1720-pr1-schema-models branch from c924067 to a031b97 Compare January 28, 2026 06:34
@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from 968dc68 to 80c8de7 Compare January 28, 2026 06:55
@typotter typotter force-pushed the typo/FFL-1720-pr1-schema-models branch 2 times, most recently from ddeaaa3 to 75827d0 Compare January 28, 2026 07:27
@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from 2eb16ca to 0cfb30a Compare January 28, 2026 07:27
@typotter typotter requested a review from dd-oleksii January 28, 2026 07:27
@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from 0cfb30a to 141cded Compare January 28, 2026 07:50
@typotter typotter marked this pull request as ready for review January 28, 2026 07:57
@typotter typotter requested a review from a team as a code owner January 28, 2026 07:57
@typotter typotter force-pushed the typo/FFL-1720-pr1-schema-models branch 2 times, most recently from 1c0c827 to 00b8f55 Compare January 28, 2026 16:30
@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from 141cded to 4bfc37e Compare January 28, 2026 16:33
@typotter typotter force-pushed the typo/FFL-1720-pr1-schema-models branch from 00b8f55 to 7dba2fe Compare January 28, 2026 17:01
@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from 4bfc37e to 550f9f0 Compare January 28, 2026 17:05
@typotter typotter force-pushed the typo/FFL-1720-pr2-aggregation branch from e09cbe5 to 66b62f4 Compare February 2, 2026 18:47
@typotter typotter requested a review from dd-oleksii February 2, 2026 18:58
Copy link
Contributor Author

@typotter typotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Oleksii, as always for your 👀 . ptal

dd-oleksii
dd-oleksii previously approved these changes Feb 3, 2026
private val context: EvaluationContext,
private val service: String?,
private val rumApplicationId: String?,
private val rumViewName: String?,
Copy link
Member

@dd-oleksii dd-oleksii Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: view name is duplicated in aggregation key. do we need it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 123 to 125
runtimeDefaultUsed = reason == null ||
reason == ResolutionReason.DEFAULT.name ||
reason == ResolutionReason.ERROR.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: the condition I would use here is aggregationKey.variantKey == null. Reasons are somewhat less reliable and it's valid to return runtime default with reasons besides "default" or "error"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. fixed

Comment on lines 69 to 71
if (shouldFlush) {
flush()
}
Copy link
Member

@dd-oleksii dd-oleksii Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: if there are many threads recording evaluations, it's still possible to flush too frequently:

  1. Thread A records an event, shouldFlush is true
  2. Thread B records an event, shouldFlush is true
  3. Thread A flushes all events, aggregation map is now empty
  4. Thread C records an event, shouldFlush is false
  5. Thread B tries to flush, flushes a single event (written by C)

Note in my previous message, the size limit is re-checked after acquiring a write lock and holding it through to the flush.

Two ways we can implement this in the current structure: either add a parameter to flush() for it to do an extra check and only flush if the size exceeds the limit. Alternatively, we can make aggregator.record() return drained records instead of a flag—then we can keep this synchronization entirely inside aggregator.

Copy link
Contributor Author

@typotter typotter Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return drained records instead of a flag—then we can keep this synchronization entirely inside aggregator.

Interesting approach

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the only downside is that it doesn't cancel scheduled flush holding the same lock, so it's possible for:

  1. Thread A calls aggregator.record() which triggers size limit and drains records
  2. Thread B writes more records
  3. Thread C — scheduled flush triggers
  4. Thread A flushes returned records
  5. Thread C continues with scheduled flush and writes just a few records

This is probably less severe than the previous situation with extra flushes due to size limit because there's at most one scheduled flush whereas the number of threads producing evaluations can be much higher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

}
val toDrain = aggregationMap
aggregationMap = ConcurrentHashMap()
return toDrain.map { (_, stats) -> stats.toEvaluationEvent() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this mapping is best done outside of write lock

)

if (drainedEvents != null) {
writer.writeAll(drainedEvents)
Copy link
Member

@dd-oleksii dd-oleksii Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-schedule periodic flush here? (don't forget flushMutex, so we don't schedule multiple futures simultaneously)

*
* @param events the evaluation events to write to storage
*/
fun writeAll(events: List<FlagEvaluation>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as I noted in another review, maybe it can be just write, especially if we don't expect another methods in this interface

val flagKey: String,
val variantKey: String? = null,
val allocationKey: String? = null,
val targetingKey: String?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there specific reason why targetingKey has no default value as other nullable properties of this class?

Should this property be nullable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be nullable because the user might not pass the targeting key. But null is not a good default value per se as that might hide issues/bugs.

I would suggest removing default values from the rest of the properties. If they are null, it's better to pass that explicitly

import kotlin.concurrent.withLock

/**
* Aggregates flag evaluation events before writing them to storage.
Copy link
Member

@0xnm 0xnm Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that aggregation logic can be done on the backend side, is there a reason we do this on the client side? Seems like doing it on the client side adds some complexity.

Or is it because otherwise we have to send too many events to the intake?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the client-side aggregation is to reduce network usage. Flag evaluations can be on the hot path in users' code, and would generate tons of events without aggregation.

Comment on lines +101 to +102
val toDrain = aggregationMap
aggregationMap = ConcurrentHashMap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simpler:

val toDrain = aggregationMap.toMap()
aggregationMap.clear()

and then aggregationMap can be simply val without @Volatile

errorMessage = errorMessage
)

mapLock.read {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although aggregationMap.putIfAbsent below is a write operation

existing?.recordEvaluation(timestamp, errorMessage)
}

if (aggregationMap.size < maxAggregations) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note:

* Bear in mind that the results of aggregate status methods including
 * {@code size}, {@code isEmpty}, and {@code containsValue} are typically
 * useful only when a map is not undergoing concurrent updates in other threads.

but probably it is fine is .size call is off by a bit

Comment on lines +40 to +43
private var count: Int = 1
private var firstEvaluation: Long = firstTimestamp
private var lastEvaluation: Long = firstTimestamp
private var lastErrorMessage: String? = errorMessage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done simpler, without mutability and synchronisation primitives.

AggregationStats can be just a data class, meaning when update is needed .copy method is called at the usage side.

FlagEvaluation.Rum(
application = FlagEvaluation.Application(id = appId),
view = aggregationKey.viewName?.let { viewName ->
FlagEvaluation.View(url = viewName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking View URL != View Name. Is there a mistake here?

@Volatile
private var aggregationMap = ConcurrentHashMap<AggregationKey, AggregationStats>()

private val mapLock = ReentrantReadWriteLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we guard concurrent access to this class from the processor side, why do we need another guard here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants