Skip to content

Draft: Metrics instrumentation across all DataStoreTypes (ZK, HTTP, DROVE) #108

Open
jitendradhawan wants to merge 29 commits into
appform-io:2.xfrom
jitendradhawan:metrics_instrumentation
Open

Draft: Metrics instrumentation across all DataStoreTypes (ZK, HTTP, DROVE) #108
jitendradhawan wants to merge 29 commits into
appform-io:2.xfrom
jitendradhawan:metrics_instrumentation

Conversation

@jitendradhawan

Copy link
Copy Markdown
Contributor

This MR adds fine-grained operational metrics instrumentation to the Ranger Service Discovery framework, covering all three data store types (ZK, HTTP, DROVE). It introduces a centralized MetricRecorder utility class for contextual metric recording at key operational boundaries.

jitendradhawan and others added 24 commits April 8, 2026 08:48
…r-fix

[Fix] Collision Check concurrency handling
…nanoseconds-fix

[Fix] Prevent duplicate IDs by switching to monotonic clock (nanoTime)

@ToOnlyGaurav ToOnlyGaurav left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go through comments

  • Let's call metric_id to be something meaningful like upstream_id or dataSourceId etc.

@SuperBuilder
public abstract class AbstractRangerHubClient<T, R extends ServiceRegistry<T>, D extends Deserializer<T>> implements RangerHubClient<T,R> {

private final String metricId;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it generic name closer to functional. Keeping non-functional attribute with functional group looks a bit off

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to upstreamId


// Wait for initial update and a brief period for metric recording to complete
awaitRefresh(registry);
sleep(100); // Allow time for MetricRecorder call after updateNodes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible to use any synchronisation mechanism which gives better guarantees than sleep. Most of the time these tests fails if we have slow environments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using awaitility now instead of sleep to check periodically with few ms wait

List<ServiceNode<T>> nodes = new ArrayList<>(children.size());
log.debug("Found {} nodes for [{}]", children.size(), serviceName);
if(children.isEmpty()){
MetricRecorder.recordNullOrEmptyListNodeResponse(DataStoreType.ZK, metricId);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have wrapper method for these two.

@jitendradhawan jitendradhawan Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted new method recordNullOrEmptyResponse to encapsulate both methods

public static final String STALE_DATA_RETAINED = "staleDataRetained";
public static final String ZK_READ = "zkRead";

private static MetricRegistry metricRegistry = new MetricRegistry();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new MetricRegistry(); is of no use.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


public static void recordNodeDataSinkUpdateStatus(DataStoreType dataStoreType, String metricId, String status) {
if (metricRegistry != null) {
metricRegistry.meter(MetricRegistry.name(PACKAGE_PREFIX, DATA_SOURCE, dataStoreType.name(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit confusion when read with line#110

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by using DATA_STORE_TYPE, dataStoreType.name(), DATA_SOURCE, upstreamId in all MetricRecorder methods

? oldV
: newV))
.values();
MetricRecorder.recordServiceNodesReturned(service.getServiceName(), serviceNodes.size());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are getting these from upstreams, how we will figure out which upstream is returning wrong value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're pushing node count also which are fetched from any upstream in ServiceRegistryUpdater

dataLock.lock();
try {
long resolvedTime = resolution.convert(timeInMillis, TimeUnit.MILLISECONDS);
long resolvedTime = resolution.convert(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get the need of doing this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

public boolean check(long timeInMillis, int location) {
public long checkAndGetTime(int location) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks overloaded from the response perspective.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -31,8 +31,18 @@
public class CollisionChecker {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0goyal Please take a look.

val serviceRegistrationResponse = registerService(service.getServiceName(), httpUrl, requestBody).orElse(null);
if(null == serviceRegistrationResponse || !serviceRegistrationResponse.valid()){
log.warn("Http call to {} returned a failure response {}", httpUrl, serviceRegistrationResponse);
MetricRecorder.recordNullOrEmptyRegisterServiceResponse(DataStoreType.HTTP, metricId, service.getServiceName());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's shrink these two methods to one.
Don't want metrics related code to be surfaced more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted to recordNullOrEmptyRegisterServiceResponse

@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants