Skip to content

feat: bucket set-up#211

Draft
RaidedCluster wants to merge 1 commit into
forecastingresearch:mainfrom
RaidedCluster:johan/external-submissions-v2
Draft

feat: bucket set-up#211
RaidedCluster wants to merge 1 commit into
forecastingresearch:mainfrom
RaidedCluster:johan/external-submissions-v2

Conversation

@RaidedCluster

Copy link
Copy Markdown
Collaborator

Adds Makefile, setup_permissions.sh, and variables.mk.example to automate IAM setup for the external submissions pipeline.

@RaidedCluster RaidedCluster force-pushed the johan/external-submissions-v2 branch 2 times, most recently from c8b322f to 5046302 Compare June 9, 2026 12:45
Comment thread src/helpers/utils.py Outdated
@@ -0,0 +1,39 @@
"""Submission pipeline date utilities.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this file submissions.py

Comment thread src/helpers/email.py Outdated
Max submissions: 3 per round (one per model).

Validate your file before submitting:
{VALIDATE_URL}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation is not here yet, so delete these lines

Comment thread src/helpers/email.py Outdated

Team: {display_name}
Upload folder: gs://{upload_bucket}/{team_id}/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention the next forecast due date

Comment thread src/helpers/utils.py Outdated
import os
from datetime import datetime, timezone

BUILD_ENV = os.environ.get("BUILD_ENV", "prod")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/external-submissions/PIPELINE.md Outdated
| `team_id` | `team1`, `team2`, ... — permanent internal ID, used as GCS folder name |
| `team_name` | Optional display name (unique). Shown in emails. |
| `organization` | Public name. `"Anonymous N"` if anonymous. |
| `original_organization` | Always the real org name. Never shown publicly. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call deanonymized_organization

FRI_EMAIL=forecastbench@forecastingresearch.org
VALIDATE_URL=

BUILD_ENV=prod

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use RunMode

SUBMISSIONS_BUCKET=<project>-submissions
SUBMISSIONS_INTERSTITIAL_BUCKET=<project>-submissions-interstitial
SUBMISSIONS_HISTORY_BUCKET=<project>-submissions-history
SUBMISSIONS_DEPLOYER=<your-email>@forecastingresearch.org

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put all of this in variables.mk in the root dir and delete this file

@@ -0,0 +1,75 @@
#!/bin/bash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a permissions script, can you do this in Python using google-cloud-resource-manager?

Codex says something like:

from google.cloud import resourcemanager_v3
from google.iam.v1 import iam_policy_pb2, policy_pb2


def grant_role_on_folder(
    folder_id: str,
    member: str,
    role: str,
) -> None:
    """
    Example:
      folder_id = "123456789012"
      member = "serviceAccount:my-sa@my-project.iam.gserviceaccount.com"
      role = "roles/viewer"
    """
    client = resourcemanager_v3.FoldersClient()
    resource = f"folders/{folder_id}"

    policy = client.get_iam_policy(
        request=iam_policy_pb2.GetIamPolicyRequest(resource=resource)
    )

    for binding in policy.bindings:
        if binding.role == role:
            if member not in binding.members:
                binding.members.append(member)
            break
    else:
        policy.bindings.append(
            policy_pb2.Binding(
                role=role,
                members=[member],
            )
        )

    client.set_iam_policy(
        request=iam_policy_pb2.SetIamPolicyRequest(
            resource=resource,
            policy=policy,
        )
    )

@@ -0,0 +1,130 @@
#!/bin/bash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost as to why there's a deploy.sh script when you have a Makefile


Each team gets a unique internal ID (team1, team2, ...) used as their GCS folder name.
One organization can have multiple teams (e.g. GDM has team1 and team2).
Teams can optionally provide a display name to distinguish multiple teams from the same org.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teams can optionally provide a display name to distinguish multiple teams from the same org.

This is not correct

@houtanb

houtanb commented Jun 9, 2026

Copy link
Copy Markdown
Member

The review above was mine. This is from Codex. I haven't looked into the details of it but skimmed it and it seems useful

  1. Critical: deploy paths are broken. make deploy-onboard copies missing files before deploy: src/helpers/
     submission_validation.py, functions/common/email.py, and functions/common/utils.py do not exist
     (forecastbench/src/external-submissions/Makefile:26). The documented ./src/external-submissions/
     functions/deploy.sh onboard path is also broken because ROOT_DIR=../../ is evaluated after the script cds
     into functions/, so helper copies point at the wrong directory (forecastbench/src/external-submissions/
     functions/deploy.sh:18, forecastbench/src/external-submissions/functions/deploy.sh:110). The script is
     also committed 100644, but docs invoke it directly (forecastbench/src/external-submissions/
     PIPELINE.md:63).

  2. Critical: setup-permissions likely fails. The Makefile passes PROCESSING_BUCKET=$(FORECAST_SETS_BUCKET),
     but the script requires FORECAST_SETS_BUCKET (forecastbench/src/external-submissions/Makefile:92,
     forecastbench/src/external-submissions/functions/setup_permissions.sh:13). My make -n setup-permissions
     confirmed the required variable is not passed.

  3. High: remove can report access revoked while access remains. _handle_remove catches IAM removal errors,
     then still marks the team inactive and returns "GCS access revoked" (forecastbench/src/external-
     submissions/functions/onboard/main.py:170). That is a security failure mode: deleted teams can retain
     bucket access.

  4. High: onboard can report success without granting upload access. _iam() catches and only prints IAM setup
     failures, while the API still returns 201 (forecastbench/src/external-submissions/functions/onboard/
     main.py:240). Since setting conditional IAM also requires uniform bucket-level access【turn1view0†L, any
     bucket missing that precondition silently creates a team that cannot upload.

  5. High: team IDs and anonymous names race. _next_team_id() and _next_anon_number() count existing Firestore
     docs outside a transaction (forecastbench/src/external-submissions/functions/onboard/main.py:34,
     forecastbench/src/external-submissions/functions/onboard/main.py:40). Concurrent requests can assign the
     same teamN and same GCS prefix.

  6. High: the promised folder workflow does not match GCS IAM semantics. The condition only grants object-
     prefix access (forecastbench/src/external-submissions/functions/onboard/main.py:103), but Cloud Storage
     says storage.objects.list is bucket-level and cannot be prefix-restricted with resource.name. The code/
     email tell users to use GCP Console and gcloud storage cp (forecastbench/src/external-submissions/
     functions/onboard/main.py:266, forecastbench/src/helpers/email.py:63), but Google documents list
     permissions as required for CLI/Console upload verification.

  7. High: non-Google email handling contradicts the docs. Docs say registration succeeds with a warning for
     non-Google accounts (forecastbench/src/external-submissions/PIPELINE.md:47), but IAM setup still includes
     those principals in the same binding (forecastbench/src/external-submissions/functions/onboard/
     main.py:238). If one invalid principal makes set_iam_policy fail, no one gets access, and the API still
     returns success.

  8. Medium: external teams get delete/overwrite powers. The code grants roles/storage.objectUser
     (forecastbench/src/external-submissions/functions/onboard/main.py:107), which includes delete/update
     permissions. Google notes delete is only needed for overwrites. If submissions should be auditable, use
     objectCreator/custom roles or object versioning/retention.

  9. Medium: side effects are not atomic. The handler creates .keep, writes Firestore, then mutates IAM and
     sends email (forecastbench/src/external-submissions/functions/onboard/main.py:220). Any partial failure
     leaves orphan folders, active teams without access, or emails sent before permissions are valid.

  10. Medium: secrets are deployed as plain env vars. SMTP_PASSWORD is passed through --set-env-vars
     (forecastbench/src/external-submissions/Makefile:13, forecastbench/src/external-submissions/functions/
     deploy.sh:43). Google recommends Secret Manager for sensitive Cloud Run/Functions config and warns env-
     var secrets can leak through misconfiguration/logging.

  11. Medium: shell scripts continue after failures. Neither script uses set -euo pipefail; failed cp, gcloud,
     or rm commands can be followed by "deployed" / "Done" messages (forecastbench/src/external-submissions/
     functions/deploy.sh:60, forecastbench/src/external-submissions/functions/setup_permissions.sh:22).

  12. Low/medium: docs and filename contract disagree. API instructions say {round_date}.{organization}.
     {N}.json (forecastbench/src/external-submissions/functions/onboard/main.py:269); email says
     {forecast_due_date}.{organization}.{N}.json (forecastbench/src/helpers/email.py:62). Existing code uses
     forecast_due_date.

  13. Low: CORS preflight omits Authorization. Browser calls with bearer tokens will fail preflight because
     only Content-Type is allowed (forecastbench/src/external-submissions/functions/onboard/main.py:140).

  14. Low: no tests cover this. There are no tests for team ID allocation, IAM failure behavior, removal,
     validation, or deploy preflight. This code is mostly cloud side effects, so fake Firestore/GCS tests
     would catch the biggest regressions.

@houtanb houtanb marked this pull request as draft June 9, 2026 12:47
@RaidedCluster RaidedCluster force-pushed the johan/external-submissions-v2 branch 5 times, most recently from f04f5df to bcc2b44 Compare June 10, 2026 06:18
@RaidedCluster RaidedCluster force-pushed the johan/external-submissions-v2 branch from bcc2b44 to 2f61ff4 Compare June 10, 2026 06:24
RUN_MODE=$(RUN_MODE),\
SMTP_USER=$(SMTP_USER),\
SMTP_PASSWORD=$(SMTP_PASSWORD),\
UPLOAD_BUCKET=$(SUBMISSIONS_BUCKET),\

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPLOAD_BUCKET should be called SUBMISSIONS_BUCKET as the first one is not informative.

also, you need to modify variables.example.mk to include any variables that need to be set

#!/usr/bin/env python3
"""Set up IAM permissions for the ForecastBench submissions service account.

Run once per environment after creating the service account:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "environment" mean here?


Run once per environment after creating the service account:

eval $(cat ../../variables.mk | grep -v '^#' | xargs) python setup_permissions.py

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove grep -v '^#' |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this comment altogether

eval $(cat ../../variables.mk | grep -v '^#' | xargs) python setup_permissions.py

Requirements:
pip install google-cloud-storage google-cloud-resource-manager

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these two lines

UPLOAD_BUCKET=$(SUBMISSIONS_BUCKET),\
NEXT_DUE_DATE=$(NEXT_DUE_DATE)

deploy-onboard:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the files this deplooyment depends on should be on this line

NEXT_DUE_DATE=$(NEXT_DUE_DATE)

deploy-onboard:
cp $(ROOT_DIR)src/helpers/email.py $(FUNC_DIR)/onboard/email_utils.py

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this file renamed in the copy?

--gen2 \
--project=$(CLOUD_PROJECT) \
--region=$(REGION) \
--runtime=python312 \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why python 3.12?

--set-env-vars=$(ONBOARD_ENV_VARS)
rm -f $(FUNC_DIR)/onboard/email_utils.py

setup-permissions:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this rule

| `service_accounts` | GCP service accounts for automated uploads. No emails sent to these. |
| `anonymous` | bool |
| `created_at` | Firestore server timestamp |
| `active` | bool — set to false on removal |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the above list add two more columns showing what's required vs optional and the default value when optional. e.g., anonymous is optional, default false

Comment thread src/helpers/email.py
SMTP_USER = os.environ.get("SMTP_USER", "")
SMTP_PASSWORD = os.environ.get("SMTP_PASSWORD", "")
SMTP_HOST = os.environ.get("SMTP_HOST", "smtp.gmail.com")
SMTP_PORT = int(os.environ.get("SMTP_PORT", "587"))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do that in src/helpers/env.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants