Conversation
…ainers/ Pass --arch arm64 to apptainer build for the jupiter (GH200) container while keeping the default x86_64 arch for the others. Also renames the apptainer/ directory to containers/ and updates all references. Co-authored-by: Cursor <cursoragent@cursor.com>
apptainer --arch arm64 on an x86 host needs binfmt_misc + QEMU to execute arm64 binaries during the %post scriptlet. Co-authored-by: Cursor <cursoragent@cursor.com>
geoalgo
left a comment
There was a problem hiding this comment.
LGTM but they are some conflicts to solve.
| - task: include_base_44_ukrainian | ||
| subset: Ukrainian | ||
|
|
||
| dclm-core-22: |
There was a problem hiding this comment.
could you add the dataset field to either the task group or tasks? otherwise dataset pre-downloading (before job submission) will fail/not be available and then the jobs will fail unless you have internet access on the compute nodes. You can check global-mmlu-eu or generic-multilingual for an example of how this looks
|
|
||
| build-push: | ||
| needs: setup-lambda | ||
| strategy: |
There was a problem hiding this comment.
we moved to SkyPilot + Lambda Labs for the container build workflow so the actual workflows are now split between .github/workflows and .github/sky
|
Thanks for the contribution and making this work for Jupiter @harshraj172! Could you update your branch with the latest changes from main? We changed a few things about the github actions workflows (left you a comment about that) and testing. |
| SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1" | ||
| EVAL_CONTAINER_IMAGE: "lm-eval-jupiter.sif" | ||
| LIGHTEVAL_CONTAINER_IMAGE: "lighteval-jupiter.sif" | ||
| SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1 --env SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt" |
There was a problem hiding this comment.
You'll probably also have to integrate this into the container downloading logic in oellm/utils.py::_ensure_singularity_image otherwise it'll only download the lm-eval container
To add lighteval for dclm on JUPITER we had to add another
jupiter-lighteval.defas including the installations in the same.deffile for jupiter was creating conflicting issues. Because Jupiter uses ARM64, and many of lighteval's runtime dependencies (spacy, underthesea, pyvi, etc) lack pre-built aarch64 wheels