Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 53 additions & 35 deletions modules/log-collector-resources-scheduling.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,70 +17,88 @@ Administrators can change the resources and scheduling of the collector by confi

.Procedure

. Update the `ClusterLogForwarder` CR:
. Update the `ClusterLogForwarder` CR to configure scheduling and resources.
+
The following example displays `ClusterLogForwarder` CR YAML:
The following example schedules collectors on infrastructure nodes:
+
[source,yaml]
----
apiVersion: observability.openshift.io/v1
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: <name>
namespace: <namespace>
name: instance
namespace: openshift-logging
spec:
collector:
nodeSelector:
collector: needed
node-role.kubernetes.io/infra: ""
# ...
----
+
The following example schedules collectors on dedicated infrastructure nodes with taints:
+
[source,yaml]
----
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
collector:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- key: node-role.kubernetes.io/infra
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/infra
operator: Exists
effect: NoExecute
# ...
----
+
The following example shows all available scheduling and resource fields:
+
[source,yaml]
----
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
collector:
nodeSelector:
node-role.kubernetes.io/infra: ""
resources:
limits:
memory: 1Gi
requests:
cpu: 100m
memory: 1Gi
tolerations:
- key: "logging"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
- key: node-role.kubernetes.io/infra
operator: Exists
effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: label-1
- key: node-role.kubernetes.io/infra
operator: Exists
weight: 1
podAffinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: test
- key: app.kubernetes.io/component
operator: In
values:
- value1
topologyKey: kubernetes.io/hostname
weight: 50
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: run
operator: In
values:
- test
namespaceSelector: {}
topologyKey: kubernetes.io/hostname
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
- collector
topologyKey: topology.kubernetes.io/zone
weight: 100
# ...
----
Expand Down
58 changes: 7 additions & 51 deletions modules/logging-loki-pod-placement.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ where:

In the earlier example configuration, all Loki pods are moved to nodes containing the `node-role.kubernetes.io/infra: ""` label.

The following example displays `LokiStack` CR with node selectors and tolerations:
The following example displays `LokiStack` CR with node selectors and tolerations for dedicated infrastructure nodes with taints. The configuration pattern is shown for three components and applies to all components:
[source,yaml]
----
apiVersion: loki.grafana.com/v1
Expand Down Expand Up @@ -89,56 +89,6 @@ spec:
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
indexGateway:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
ingester:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
querier:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
queryFrontend:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
ruler:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
gateway:
nodeSelector:
node-role.kubernetes.io/infra: ""
Expand All @@ -149,9 +99,15 @@ spec:
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
# ... repeat for indexGateway, ingester, querier, queryFrontend, ruler
# ...
----

[NOTE]
====
Apply the same `nodeSelector` and `tolerations` configuration to all LokiStack components: `compactor`, `distributor`, `gateway`, `indexGateway`, `ingester`, `querier`, `queryFrontend`, and `ruler`.
====

To configure the `nodeSelector` and `tolerations` fields of the `LokiStack` (CR), you can use the [command]`oc explain` command to view the description and fields for a particular resource:

[source,terminal]
Expand Down
70 changes: 70 additions & 0 deletions modules/logging-scheduling-use-cases.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
// Module included in the following assemblies:
//
// * scheduling_resources/scheduling-logging-resources.adoc

:_mod-docs-content-type: CONCEPT
[id="logging-scheduling-use-cases_{context}"]
= Scheduling use cases for logging components

[role="_abstract"]
Different deployment scenarios require different scheduling approaches. Use this guide to determine which scheduling mechanism to use for your logging infrastructure.

The following table describes common use cases and the scheduling mechanisms to apply:

.Scheduling mechanisms by use case
[cols="2,1,1,1,1",options="header"]
|===
|Use case |Node selectors |Taints and tolerations |Affinity rules |Resource limits

|Schedule logging on infrastructure nodes
|Required
|Optional
|Not required
|Optional

|Dedicate nodes exclusively to logging
|Required
|Required
|Not required
|Optional

|Distribute logging across availability zones
|Not required
|Not required
|Required
|Not required

|Tune logging performance and resource usage
|Not required
|Not required
|Not required
|Required

|===

== Infrastructure nodes

When you have dedicated infrastructure nodes labeled with `node-role.kubernetes.io/infra`, use node selectors to schedule logging components on those nodes. This separates logging workloads from application workloads, which optimizes costs and maintains clear operational boundaries.

To prevent non-logging workloads from using infrastructure nodes, apply taints to the infrastructure nodes and configure tolerations on logging pods. This ensures that infrastructure node resources are reserved exclusively for logging.

== High availability across zones

In multi-zone clusters, use pod anti-affinity rules to distribute LokiStack components across availability zones. This maintains logging availability during zone failures and meets business continuity requirements.

For example, configure anti-affinity to prevent multiple ingester pods from running in the same zone. If one zone fails, the remaining zones continue to process logs.

== Performance tuning

When you experience high log volume or performance issues, adjust CPU and memory resource limits for collector pods. Increasing resource limits allows collectors to handle higher throughput, while setting appropriate limits prevents logging from consuming excessive node resources.

Monitor collector resource usage and adjust limits based on actual consumption and node capacity.

== Verification

After configuring scheduling rules, verify that pods are running on the expected nodes:

* For collectors, use the `oc get pods` command with the `--selector` and `-o wide` flags to view pod placement.
* For LokiStack components, check the pod status and node assignment for each component type.

If pods are not scheduled as expected, check node labels, taints, and pod tolerations. Verify that the scheduling configuration matches your cluster's node configuration.
99 changes: 99 additions & 0 deletions modules/troubleshooting-logging-pod-scheduling.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
// Module included in the following assemblies:
//
// * scheduling_resources/scheduling-logging-resources.adoc

:_mod-docs-content-type: PROCEDURE
[id="troubleshooting-logging-pod-scheduling_{context}"]
= Troubleshooting logging pod scheduling

[role="_abstract"]
If logging pods are not scheduled on the expected nodes or remain in a pending state, verify the node labels, taints, and pod scheduling configuration.

.Prerequisites

* You have administrator permissions.
* You have installed the {clo} or {loki-op}.

.Procedure

. Check the pod status to identify scheduling issues:
+
[source,terminal]
----
$ oc get pods -n openshift-logging -o wide
----
+
Pods that cannot be scheduled display a `Pending` status.

. Describe the pod to view scheduling events:
+
[source,terminal]
----
$ oc describe pod <pod-name> -n openshift-logging
----
+
Review the `Events` section for messages such as:
+
* `0/X nodes are available: X node(s) didn't match Pod's node affinity/selector`
* `0/X nodes are available: X node(s) had untolerated taint`
* `0/X nodes are available: Insufficient cpu, Insufficient memory`

. Verify that target nodes have the required labels:
+
[source,terminal]
----
$ oc get nodes --show-labels
----
+
Confirm that nodes intended for logging have the labels specified in the `nodeSelector` configuration.

. If using taints and tolerations, verify node taints:
+
[source,terminal]
----
$ oc describe node <node-name>
----
+
Review the `Taints` section and confirm that logging pods have matching tolerations configured.

. Verify the pod's scheduling configuration:
+
For collector pods, check the `ClusterLogForwarder` custom resource:
+
[source,terminal]
----
$ oc get clusterlogforwarder <name> -n <namespace> -o yaml
----
+
For LokiStack pods, check the `LokiStack` custom resource:
+
[source,terminal]
----
$ oc get lokistack logging-loki -n openshift-logging -o yaml
----

. Correct any mismatches between the pod configuration and node labels or taints:
+
* If node labels are missing, add them:
+
[source,terminal]
----
$ oc label node <node-name> <key>=<value>
----
+
* If the pod's `nodeSelector` has a typing error, update the custom resource with the correct label.
+
* If a taint is missing from the pod's tolerations, add it to the custom resource.

. After making corrections, verify that the pods are scheduled:
+
[source,terminal]
----
$ oc get pods -n openshift-logging -o wide
----
+
Pods should move to `Running` status on the expected nodes.

.Verification

* Confirm that logging pods are running on the intended nodes by checking the `NODE` column in the pod list.
4 changes: 4 additions & 0 deletions scheduling_resources/scheduling-logging-resources.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ You can schedule logging resources by defining node selectors, taints and tolera

include::modules/logging-about-pod-scheduling-controls.adoc[leveloffset=+1]

include::modules/logging-scheduling-use-cases.adoc[leveloffset=+1]

include::modules/log-collector-resources-scheduling.adoc[leveloffset=+1]

include::modules/cluster-logging-collector-pod-location.adoc[leveloffset=+1]

include::modules/logging-loki-pod-placement.adoc[leveloffset=+1]

include::modules/troubleshooting-logging-pod-scheduling.adoc[leveloffset=+1]

[role="_additional-resources"]
== Additional resources

Expand Down