Draft: Metrics instrumentation across all DataStoreTypes (ZK, HTTP, DROVE) #108
Draft: Metrics instrumentation across all DataStoreTypes (ZK, HTTP, DROVE) #108jitendradhawan wants to merge 29 commits into
Conversation
…ng instead of a list
…r-fix [Fix] Collision Check concurrency handling
…nanoseconds-fix [Fix] Prevent duplicate IDs by switching to monotonic clock (nanoTime)
… metrics recording
…ceRegistryUpdaterMetricsIntegrationTest
…date log messages for clarity
ToOnlyGaurav
left a comment
There was a problem hiding this comment.
Please go through comments
- Let's call metric_id to be something meaningful like upstream_id or dataSourceId etc.
| @SuperBuilder | ||
| public abstract class AbstractRangerHubClient<T, R extends ServiceRegistry<T>, D extends Deserializer<T>> implements RangerHubClient<T,R> { | ||
|
|
||
| private final String metricId; |
There was a problem hiding this comment.
Let's keep it generic name closer to functional. Keeping non-functional attribute with functional group looks a bit off
There was a problem hiding this comment.
renamed to upstreamId
|
|
||
| // Wait for initial update and a brief period for metric recording to complete | ||
| awaitRefresh(registry); | ||
| sleep(100); // Allow time for MetricRecorder call after updateNodes |
There was a problem hiding this comment.
Possible to use any synchronisation mechanism which gives better guarantees than sleep. Most of the time these tests fails if we have slow environments.
There was a problem hiding this comment.
Using awaitility now instead of sleep to check periodically with few ms wait
| List<ServiceNode<T>> nodes = new ArrayList<>(children.size()); | ||
| log.debug("Found {} nodes for [{}]", children.size(), serviceName); | ||
| if(children.isEmpty()){ | ||
| MetricRecorder.recordNullOrEmptyListNodeResponse(DataStoreType.ZK, metricId); |
There was a problem hiding this comment.
Can we have wrapper method for these two.
There was a problem hiding this comment.
extracted new method recordNullOrEmptyResponse to encapsulate both methods
| public static final String STALE_DATA_RETAINED = "staleDataRetained"; | ||
| public static final String ZK_READ = "zkRead"; | ||
|
|
||
| private static MetricRegistry metricRegistry = new MetricRegistry(); |
There was a problem hiding this comment.
new MetricRegistry(); is of no use.
|
|
||
| public static void recordNodeDataSinkUpdateStatus(DataStoreType dataStoreType, String metricId, String status) { | ||
| if (metricRegistry != null) { | ||
| metricRegistry.meter(MetricRegistry.name(PACKAGE_PREFIX, DATA_SOURCE, dataStoreType.name(), |
There was a problem hiding this comment.
Looks a bit confusion when read with line#110
There was a problem hiding this comment.
Resolved by using DATA_STORE_TYPE, dataStoreType.name(), DATA_SOURCE, upstreamId in all MetricRecorder methods
| ? oldV | ||
| : newV)) | ||
| .values(); | ||
| MetricRecorder.recordServiceNodesReturned(service.getServiceName(), serviceNodes.size()); |
There was a problem hiding this comment.
If we are getting these from upstreams, how we will figure out which upstream is returning wrong value?
There was a problem hiding this comment.
Now we're pushing node count also which are fetched from any upstream in ServiceRegistryUpdater
| dataLock.lock(); | ||
| try { | ||
| long resolvedTime = resolution.convert(timeInMillis, TimeUnit.MILLISECONDS); | ||
| long resolvedTime = resolution.convert( |
There was a problem hiding this comment.
Didn't get the need of doing this?
| } | ||
|
|
||
| public boolean check(long timeInMillis, int location) { | ||
| public long checkAndGetTime(int location) { |
There was a problem hiding this comment.
This method looks overloaded from the response perspective.
| @@ -31,8 +31,18 @@ | |||
| public class CollisionChecker { | |||
| val serviceRegistrationResponse = registerService(service.getServiceName(), httpUrl, requestBody).orElse(null); | ||
| if(null == serviceRegistrationResponse || !serviceRegistrationResponse.valid()){ | ||
| log.warn("Http call to {} returned a failure response {}", httpUrl, serviceRegistrationResponse); | ||
| MetricRecorder.recordNullOrEmptyRegisterServiceResponse(DataStoreType.HTTP, metricId, service.getServiceName()); |
There was a problem hiding this comment.
let's shrink these two methods to one.
Don't want metrics related code to be surfaced more.
There was a problem hiding this comment.
extracted to recordNullOrEmptyRegisterServiceResponse
…or improved clarity
…s; simplify metric recording
|



This MR adds fine-grained operational metrics instrumentation to the Ranger Service Discovery framework, covering all three data store types (ZK, HTTP, DROVE). It introduces a centralized
MetricRecorderutility class for contextual metric recording at key operational boundaries.