Skip to content

Comments

Add dclm-core-22 to jupiter#42

Open
harshraj172 wants to merge 5 commits intomainfrom
harsh/dclm-core-22-jupiter
Open

Add dclm-core-22 to jupiter#42
harshraj172 wants to merge 5 commits intomainfrom
harsh/dclm-core-22-jupiter

Conversation

@harshraj172
Copy link
Collaborator

To add lighteval for dclm on JUPITER we had to add another jupiter-lighteval.def as including the installations in the same .def file for jupiter was creating conflicting issues. Because Jupiter uses ARM64, and many of lighteval's runtime dependencies (spacy, underthesea, pyvi, etc) lack pre-built aarch64 wheels

timurcarstensen and others added 5 commits February 7, 2026 10:05
…ainers/

Pass --arch arm64 to apptainer build for the jupiter (GH200) container
while keeping the default x86_64 arch for the others. Also renames the
apptainer/ directory to containers/ and updates all references.

Co-authored-by: Cursor <cursoragent@cursor.com>
apptainer --arch arm64 on an x86 host needs binfmt_misc + QEMU to
execute arm64 binaries during the %post scriptlet.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Collaborator

@geoalgo geoalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but they are some conflicts to solve.

- task: include_base_44_ukrainian
subset: Ukrainian

dclm-core-22:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add the dataset field to either the task group or tasks? otherwise dataset pre-downloading (before job submission) will fail/not be available and then the jobs will fail unless you have internet access on the compute nodes. You can check global-mmlu-eu or generic-multilingual for an example of how this looks


build-push:
needs: setup-lambda
strategy:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we moved to SkyPilot + Lambda Labs for the container build workflow so the actual workflows are now split between .github/workflows and .github/sky

@timurcarstensen
Copy link
Collaborator

Thanks for the contribution and making this work for Jupiter @harshraj172! Could you update your branch with the latest changes from main? We changed a few things about the github actions workflows (left you a comment about that) and testing.

SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1"
EVAL_CONTAINER_IMAGE: "lm-eval-jupiter.sif"
LIGHTEVAL_CONTAINER_IMAGE: "lighteval-jupiter.sif"
SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1 --env SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably also have to integrate this into the container downloading logic in oellm/utils.py::_ensure_singularity_image otherwise it'll only download the lm-eval container

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants