Skip to content

Add alerts to catch Knative TestGrid pods not running #1066

@michelle192837

Description

@michelle192837

Stuck in CrashLoopBackoff due to permissions issue reading the config, e.g.:

jsonPayload: {
error: "observe config: can't read "gs://knative-own-testgrid/config": open: Get "https://storage.googleapis.com/knative-own-testgrid/config": compute: Received 403 `Unable to generate access token; IAM returned 403 Forbidden: The caller does not have permission
This error could be caused by a missing IAM policy binding on the target IAM service account.
For more information, refer to the Workload Identity documentation:
	https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to

`"
file: "cmd/summarizer/main.go:151"
func: "main.main"
level: "error"
msg: "Could not summarize"
}

I ran https://github.com/GoogleCloudPlatform/testgrid/blob/master/cluster/bind-service-accounts.sh to see if any of the SAs need to be re-bound, and it seems like the answer was 'yes':

./bind-service-accounts.sh
Service accounts:
./canary/api.yaml:    iam.gke.io/gcp-service-account: testgrid-canary-api@k8s-testgrid.iam.gserviceaccount.com
./canary/api.yaml:  namespace: testgrid-canary
./canary/api.yaml:      serviceAccountName: api
./canary/config_merger.yaml:    iam.gke.io/gcp-service-account: testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
./canary/config_merger.yaml:  namespace: testgrid-canary
./canary/config_merger.yaml:      serviceAccountName: config-merger
./canary/monitoring.yaml:  namespace: testgrid-canary
./canary/summarizer.yaml:    iam.gke.io/gcp-service-account: testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
./canary/summarizer.yaml:  namespace: testgrid-canary
./canary/summarizer.yaml:      serviceAccountName: summarizer
./canary/tabulator.yaml:    iam.gke.io/gcp-service-account: testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
./canary/tabulator.yaml:  namespace: testgrid-canary
./canary/tabulator.yaml:      serviceAccountName: tabulator
./canary/updater.yaml:    iam.gke.io/gcp-service-account: testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
./canary/updater.yaml:  namespace: testgrid-canary
./canary/updater.yaml:      serviceAccountName: updater
./prod/config_merger.yaml:    iam.gke.io/gcp-service-account: updater@k8s-testgrid.iam.gserviceaccount.com
./prod/config_merger.yaml:  namespace: testgrid
./prod/config_merger.yaml:      serviceAccountName: config-merger
./prod/knative/summarizer.yaml:    iam.gke.io/gcp-service-account: testgrid-updater@knative-tests.iam.gserviceaccount.com
./prod/knative/summarizer.yaml:  namespace: knative
./prod/knative/summarizer.yaml:      serviceAccountName: summarizer
./prod/knative/tabulator.yaml:    iam.gke.io/gcp-service-account: testgrid-updater@knative-tests.iam.gserviceaccount.com
./prod/knative/tabulator.yaml:  namespace: knative
./prod/knative/tabulator.yaml:      serviceAccountName: tabulator
./prod/knative/updater.yaml:    iam.gke.io/gcp-service-account: testgrid-updater@knative-tests.iam.gserviceaccount.com
./prod/knative/updater.yaml:  namespace: knative
./prod/knative/updater.yaml:      serviceAccountName: updater
./prod/monitoring.yaml:  namespace: testgrid
./prod/README.md:1. Bind the service account(s) for the component in the `testgrid-canary` namespace:
./prod/README.md:1. Bind the service account(s) for the component in the `testgrid` namespace:
./prod/summarizer.yaml:    iam.gke.io/gcp-service-account: updater@k8s-testgrid.iam.gserviceaccount.com
./prod/summarizer.yaml:  namespace: testgrid
./prod/summarizer.yaml:      serviceAccountName: summarizer
./prod/tabulator.yaml:    iam.gke.io/gcp-service-account: updater@k8s-testgrid.iam.gserviceaccount.com
./prod/tabulator.yaml:  namespace: testgrid
./prod/tabulator.yaml:      serviceAccountName: tabulator
./prod/updater.yaml:    iam.gke.io/gcp-service-account: updater@k8s-testgrid.iam.gserviceaccount.com
./prod/updater.yaml:  namespace: testgrid
./prod/updater.yaml:      serviceAccountName: updater
./setup.sh:echo -n 'testgrid namespace: ' >&2
NOOP: testgrid-canary/config-merger has workloadIdentityUser access to testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
NOOP: testgrid-canary/summarizer has workloadIdentityUser access to testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
NOOP: testgrid-canary/tabulator has workloadIdentityUser access to testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
NOOP: testgrid-canary/updater has workloadIdentityUser access to testgrid-canary@k8s-testgrid.iam.gserviceaccount.com
serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater] in serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
Grant serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer] roles/iam.workloadIdentityUser access to testgrid-updater@knative-tests.iam.gserviceaccount.com? [y/N] y
+ /usr/bin/gcloud iam service-accounts --project knative-tests add-iam-policy-binding testgrid-updater@knative-tests.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member 'serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]'
Updated IAM policy for serviceAccount [testgrid-updater@knative-tests.iam.gserviceaccount.com].
bindings:
- members:
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
  - serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater]
  role: roles/iam.workloadIdentityUser
etag: BwXq2u1cNwo=
version: 1
DONE: gave knative/summarizer workloadIdentityUser access to testgrid-updater@knative-tests.iam.gserviceaccount.com
serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater] in serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]
Grant serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator] roles/iam.workloadIdentityUser access to testgrid-updater@knative-tests.iam.gserviceaccount.com? [y/N] y
+ /usr/bin/gcloud iam service-accounts --project knative-tests add-iam-policy-binding testgrid-updater@knative-tests.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member 'serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]'
Updated IAM policy for serviceAccount [testgrid-updater@knative-tests.iam.gserviceaccount.com].
bindings:
- members:
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]
  - serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater]
  role: roles/iam.workloadIdentityUser
etag: BwXq2u2Rpkc=
version: 1
DONE: gave knative/tabulator workloadIdentityUser access to testgrid-updater@knative-tests.iam.gserviceaccount.com
serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater] in serviceAccount:k8s-testgrid.svc.id.goog[knative/updater]
Grant serviceAccount:k8s-testgrid.svc.id.goog[knative/updater] roles/iam.workloadIdentityUser access to testgrid-updater@knative-tests.iam.gserviceaccount.com? [y/N] y
+ /usr/bin/gcloud iam service-accounts --project knative-tests add-iam-policy-binding testgrid-updater@knative-tests.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member 'serviceAccount:k8s-testgrid.svc.id.goog[knative/updater]'
Updated IAM policy for serviceAccount [testgrid-updater@knative-tests.iam.gserviceaccount.com].
bindings:
- members:
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/updater]
  - serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater]
  role: roles/iam.workloadIdentityUser
etag: BwXq2u4Lseg=
version: 1
DONE: gave knative/updater workloadIdentityUser access to testgrid-updater@knative-tests.iam.gserviceaccount.com
NOOP: testgrid/config-merger has workloadIdentityUser access to updater@k8s-testgrid.iam.gserviceaccount.com
NOOP: testgrid/summarizer has workloadIdentityUser access to updater@k8s-testgrid.iam.gserviceaccount.com
NOOP: testgrid/tabulator has workloadIdentityUser access to updater@k8s-testgrid.iam.gserviceaccount.com
NOOP: testgrid/updater has workloadIdentityUser access to updater@k8s-testgrid.iam.gserviceaccount.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions