Skip to content

Conversation

@ELENAGER
Copy link
Member

Remove noisy/non-informative logs, add context to critical log entries, and log at a higher scope than PVC to avoid repeated lines.

Copy link
Member

@asn1809 asn1809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, helps reading logs in good manner.

v.log.Info("Separate PVCs using only StorageClass provisioner")
} else {
v.log.Info("Separate PVCs using peerClasses")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a debugging log about how we handle the PVCs. Move the debug level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hardly use debug level, and adding logs at that level is not my choice. This instance can be root caused inspecting the VRG resource and looking at peerClass elements, and I would as such remove this log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

return nil, fmt.Errorf("no %s found to match provisioner and schedule", objType)
case 1:
v.log.Info(fmt.Sprintf("Found %s that matches provisioner and schedule %s/%s", objType,
storageClass.Provisioner, v.instance.Spec.Async.SchedulingInterval))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we log the result of this search later?

Logging what we found once makes sense. If we don't log it elsewhere this may be a debug log that will help understanding what the system is doing.

Copy link
Member Author

@ELENAGER ELENAGER Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirs the problem here is that we are logging it for each PVC and then we see the following (all logs for only one VRG with 10 PVC during one reconcile request):

2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.113Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}
2025-10-29T11:37:36.206Z	INFO	vrg	controller/vrg_volrep.go:1451	Found VolumeGroupReplicationClass that matches provisioner and schedule openshift-storage.rbd.csi.ceph.com/5m	{"vrg": {"name":"cg-app-1","namespace":"openshift-dr-ops"}, "rid": "31b5bd24", "State": "primary"}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so the log per pvc do not add any value, we can remove them.

// this. If not found, then reconcile request would not be sent
selector, err := metav1.LabelSelectorAsSelector(&pvcSelector.LabelSelector)
if err != nil {
log1.Error(err, "Failed to get the label selector from VolumeReplicationGroup")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an expected condition or an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirs according to comment for this call, it can happen and not an error:

		// continue if we fail to get the labels for this object hoping
		// that pvc might actually belong to  some other vrg instead of
		// this. If not found, then reconcile request would not be sent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log line indicates a failure, the continue is a best case continue, just in case this PVC belongs to a different VRG. We should not get an error when we attempt to get the selector.

Copy link
Member Author

@ELENAGER ELENAGER Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShyamsundarR I didn't understand, you suggest to retain or to remove?

log1.Info("Found VolumeReplicationGroup with matching labels or owner",
"vrg", vrgNamespacedName.String(), "selector", selector,
"namespaces selected", pvcSelector.NamespaceNames,
"label match", labelMatch, "owner match", ownerMatch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a debug log - do we log the same info later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirs We are printing debug logs also, so that Ramen log is overloaded with these messages! It causes wrap around.
Here is the example from relaxed QE setup:

	Line   590: 2025-10-29T11:37:26.448Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-1","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   591: 2025-10-29T11:37:26.449Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-1","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   596: 2025-10-29T11:37:26.527Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-2","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   597: 2025-10-29T11:37:26.527Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-2","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   601: 2025-10-29T11:37:26.553Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-6","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   602: 2025-10-29T11:37:26.553Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-6","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   605: 2025-10-29T11:37:26.720Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-7","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   606: 2025-10-29T11:37:26.721Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-7","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   611: 2025-10-29T11:37:26.764Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-3","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   612: 2025-10-29T11:37:26.765Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-3","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   616: 2025-10-29T11:37:26.788Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-4","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   617: 2025-10-29T11:37:26.789Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-4","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   620: 2025-10-29T11:37:26.826Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-5","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   621: 2025-10-29T11:37:26.827Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-5","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   626: 2025-10-29T11:37:26.855Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-8","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   627: 2025-10-29T11:37:26.856Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-8","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   631: 2025-10-29T11:37:26.874Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-10","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   632: 2025-10-29T11:37:26.875Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-10","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   635: 2025-10-29T11:37:26.905Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-9","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}
	Line   636: 2025-10-29T11:37:26.906Z	INFO	pvcmap.VolumeReplicationGroup	controller/volumereplicationgroup_controller.go:359	Found VolumeReplicationGroup with matching labels or owner	{"pvc": {"name":"busybox-pvc-9","namespace":"cg-app-1"}, "vrg": "cg-app-1", "vrg": "openshift-dr-ops/cg-app-1", "selector": "workloadpattern in (simple_io_pvc)", "namespaces selected": ["cg-app-1"], "label match": true, "owner match": false}

All these are from one second

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like too verbose debug log, not clear why we need to know that we found a vrg for the pvc.

In general we need to log when we change the state of the system, for example when we are starting the use a pvc, when the pvc state changes, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I can's see, how this log helps us to debug the system, that is why I suggest to remove it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line indicates why a VRG was reconciled, i.e due to which PVC triggering a VRG reconcile.

This is useful in the steady state, where PVC events would be minimal (in terms of update/delete/create) and so knowing what triggered the VRG reconcile is useful at times in troubleshooting.

The log example given is when an application is possibly relocated/failedover or deployed with DR enabled the first time. As then all PVCs are created and bound in a couple of reconciles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShyamsundarR Usually issues don’t occur in a steady state. This PR was created to prevent ramen log wrap-around, after which we lose the logs completely and can’t investigate the problem.

This message is printed for every PVC and can be very noisy. If you feel it’s important to keep it, I can revert this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ELENAGER We need more data to tell how bad is this log. In general we want to delete (or move to debug level, hidden by default) logs that add little value and consume lot of space, leading to wrapped log files.

The way I analyzed logs in the past was to import the logs to sqlite database, and then query the database.

This is the tool I used. It is built for vdsm logs but modifying it for ramen logs should not be hard.
https://github.com/oVirt/vdsm/blob/master/contrib/logdb

Example commit showing some queries:
http://github.com/oVirt/vdsm/commit/886224ada622fcfc5e5de1aaeae030ae40ad0556

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retained this log line

CreateFunc: func(e event.CreateEvent) bool {
log.Info("Create event for PersistentVolumeClaim")
o := e.Object
log.Info("PVC Create event", "namespace", o.GetNamespace(), "name", o.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the log shorted using "PVC created"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

log.Info("Update event for PersistentVolumeClaim")
log.Info("PVC Update event", "namespace", oldPVC.GetNamespace(), "name", oldPVC.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PVC updated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

DeleteFunc: func(e event.DeleteEvent) bool {
o := e.Object
log.Info("PVC Delete", "namespace", o.GetNamespace(), "name", o.GetName())
log.Info("PVC Delete event", "namespace", o.GetNamespace(), "name", o.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PVC deleted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ELENAGER ELENAGER force-pushed the improve_logs branch 6 times, most recently from 4a1e9c6 to f938449 Compare November 6, 2025 14:11
Comment on lines 193 to 196
log.Info("Create event for PersistentVolumeClaim")
o := e.Object
log.Info("PVC created", "namespace", o.GetNamespace(), "name", o.GetName())
Copy link
Member

@raaizik raaizik Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using past-tense verbs like “created” and “deleted,” but these predicates run before reconciliation - they only observe Kubernetes events, not completed actions. Using phrasing like "Create event" reflects what’s actually happening.

Also, there’s no real need to log the object’s namespaced name at this stage - the reconciler logs that when processing the object, so removing it here would keep the logs cleaner and less noisy.

Suggested change
log.Info("Create event for PersistentVolumeClaim")
o := e.Object
log.Info("PVC created", "namespace", o.GetNamespace(), "name", o.GetName())
log.Info("Create event for PersistentVolumeClaim")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think “PVC created” is accurate because by the time we receive the ADDED event, the PVC object already exists in the cluster (persisted in etcd). We’re reacting to a completed creation, not an in-progress action. Using “Create event” might imply it hasn’t happened yet.
Regarding the namespaced name, I’d prefer to keep it for easier debugging and correlation, but open to revisiting if logs become too noisy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining the "event" word in the log is useful, as we are receiving events, and when looking at logs it is more easily understandable.

But more broadly, we should drop all logs in this function, and just rely on logs from the pvcMapFunc which decides if a PVC is of interest to reconcile a VRG.

Logs here till date never carried the PVC NamespaceName, and most often for debugging the log generated regarding which VRG that was triggered for a reconcile was more useful (as that had the PVC name).

As this is a global system wide PVC watcher, we should drop the logs in this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Event" word is retained, I still prefer to have the logs of the events, in spite of having the logs from the pvcMapFunc

}

log.Info("Update event for PersistentVolumeClaim")
log.Info("PVC updated", "namespace", oldPVC.GetNamespace(), "name", oldPVC.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Suggested change
log.Info("PVC updated", "namespace", oldPVC.GetNamespace(), "name", oldPVC.GetName())
log.Info("Update event for PersistentVolumeClaim")

DeleteFunc: func(e event.DeleteEvent) bool {
o := e.Object
log.Info("PVC Delete", "namespace", o.GetNamespace(), "name", o.GetName())
log.Info("PVC deleted", "namespace", o.GetNamespace(), "name", o.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Suggested change
log.Info("PVC deleted", "namespace", o.GetNamespace(), "name", o.GetName())
log.Info("Delete event for PersistentVolumeClaim", "namespace", o.GetNamespace(), "name", o.GetName())

Copy link
Member

@pruthvitd pruthvitd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart tradeoff — reducing log noise while keeping critical events visible longer before rotation. Thanks for addressing this.

Comment on lines 193 to 196
log.Info("Create event for PersistentVolumeClaim")
o := e.Object
log.Info("PVC created", "namespace", o.GetNamespace(), "name", o.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining the "event" word in the log is useful, as we are receiving events, and when looking at logs it is more easily understandable.

But more broadly, we should drop all logs in this function, and just rely on logs from the pvcMapFunc which decides if a PVC is of interest to reconcile a VRG.

Logs here till date never carried the PVC NamespaceName, and most often for debugging the log generated regarding which VRG that was triggered for a reconcile was more useful (as that had the PVC name).

As this is a global system wide PVC watcher, we should drop the logs in this function.

// this. If not found, then reconcile request would not be sent
selector, err := metav1.LabelSelectorAsSelector(&pvcSelector.LabelSelector)
if err != nil {
log1.Error(err, "Failed to get the label selector from VolumeReplicationGroup")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log line indicates a failure, the continue is a best case continue, just in case this PVC belongs to a different VRG. We should not get an error when we attempt to get the selector.

log1.Info("Found VolumeReplicationGroup with matching labels or owner",
"vrg", vrgNamespacedName.String(), "selector", selector,
"namespaces selected", pvcSelector.NamespaceNames,
"label match", labelMatch, "owner match", ownerMatch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line indicates why a VRG was reconciled, i.e due to which PVC triggering a VRG reconcile.

This is useful in the steady state, where PVC events would be minimal (in terms of update/delete/create) and so knowing what triggered the VRG reconcile is useful at times in troubleshooting.

The log example given is when an application is possibly relocated/failedover or deployed with DR enabled the first time. As then all PVCs are created and bound in a couple of reconciles.

v.log.Info("Separate PVCs using only StorageClass provisioner")
} else {
v.log.Info("Separate PVCs using peerClasses")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hardly use debug level, and adding logs at that level is not my choice. This instance can be root caused inspecting the VRG resource and looking at peerClass elements, and I would as such remove this log.

[]string{VRGConditionReasonUnused},
subconditions...)
msg = fmt.Sprintf("updated %s status to %s", conditionName, finalCondition.Status)
msg := fmt.Sprintf("updated %s status to %s", conditionName, finalCondition.Status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get rid of this condition log printing, the i memory condition is updated to VRG once the callers are done, and so having them in the logs maybe superfluous.

There are times while inspecting logs, how the condition progressed has been useful, but given VRG status, and looking at current state of the system this log is more noise and can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShyamsundarR I honestly prefer to retain this log, I already deleted the duplicate line (see above), so we reduced the noise. Sometimes the possibility to follow the progression can be useful and not only final result,

}

if slices.Contains(*vrg.Spec.ProtectedNamespaces, obj.GetNamespace()) {
log1.Info("Found VolumeReplicationGroup with matching namespace",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line again tells us which event triggered a VRG was reconcile. It is generally useful to know this when looking at logs, as only an external trigger should ideally trigger a VRG reconciliation (apart from any exponential backoff from a previous reconcile)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retained


rgs, ok := obj.(*ramendrv1alpha1.ReplicationGroupSource)
if !ok {
log.Info("map function received not a replication group source resource")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should drop other such log lines in the functions above this. While I have never seen this log, and so will not contribute to reducing log noise as such, it is not essential as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ELENAGER ELENAGER force-pushed the improve_logs branch 3 times, most recently from 1824948 to e9e2fe9 Compare December 9, 2025 14:18
@ELENAGER ELENAGER force-pushed the improve_logs branch 3 times, most recently from 08496f3 to 7e0f2e1 Compare December 16, 2025 16:02
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants