Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 23 additions & 7 deletions .ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,39 @@ This document describes the external pipeline executed through CSCS.
The pipeline can be triggered by commenting on a pull request with

```
cscs-ci run default # runs the default pipeline
cscs-ci run default # runs the default pipeline (on GH200 nodes @ CSCS)
cscs-ci run beverin # runs the beverin pipeline (on MI300A nodes @ CSCS)
```

An automatic trigger on all merge-requests is currently disabled.

This pipeline has 2 stages: `build` and `test`.
This pipeline has 3 stages: `prepare`, `build` and `test`.

The `build` stage builds a uenv image that includes all necessary compilers, MPI libraries and other dependecies to build QUDA and tmLQCD against QUDA. In this stage, QUDA is built correctly for the GH200 machine at CSCS with all required build flags for production runs. The uenv recipe can be found [here](uenv-recipes/tmlqcd/daint-gh200).
## `prepare` stage

In the `test` stage, the aforementioned uenv image is loaded, tmLQCD is built and linked against the QUDA library that is inside the image. Finally a minimal HMC is executed and checked against some reference data.
The `prepare` stage builds an uenv image that includes all necessary compilers, MPI libraries and other dependecies to build QUDA and tmLQCD against QUDA. The uenv recipe can be found [here for GH200](uenv-recipes/tmlqcd/daint-gh200) and [here for MI300A](uenv-recipes/tmlqcd/beverin-mi300).

## Force recompilation of quda
## `build` stage

In the `build` stage, the aforementioned uenv image is loaded, tmLQCD and QUDA are built using their spack packages using the dependencies from the base image. This stage exposes an artifact with tmLQCD/QUDA binaries. For tmLQCD, the current branch is compiled. For QUDA the following environment variables are respected:

* `QUDA_GIT_REPO`: the git repository URL to use as source (defaults to `https://github.com/lattice/quda.git`)
* `QUDA_GIT_BRANCH`: the git branch to compile (defaults to `develop`)
* `QUDA_GIT_COMMIT`: the git commit to compile (defaults to the current head commit of `QUDA_GIT_BRANCH`)

Then QUDA is cloned and compiled, completely bypassing the spack compile cache.

## `test` stage

In the `test` stage, the aforementioned uenv image is loaded, tmLQCD and QUDA are unpacked from the artifact. Finally a minimal HMC is executed and checked against some reference data.

## Force recompilation of base image in `prepare` stage

Remove the build cache:

```bash
/capstor/scratch/cscs/${USER}/uenv-cache/user-environment/build_cache/linux-sles15-neoverse_v2/gcc-13.2.0/quda-*
/capstor/scratch/cscs/${USER}/uenv-cache/user-environment/build_cache/linux-sles15-neoverse_v2-gcc-13.2.0-quda*
/capstor/scratch/cscs/${USER}/uenv-cache/user-environment/build_cache/linux-sles15-neoverse_v2/gcc-13.2.0/tmlqcd-*
/capstor/scratch/cscs/${USER}/uenv-cache/user-environment/build_cache/linux-sles15-neoverse_v2-gcc-13.2.0-tmlqcd*
```

Or increment the the version counter tag in [.ci/include/cscs/00-variables.yml](include/cscs/00-variables.yml):
Expand All @@ -46,3 +61,4 @@ and commit.
* [CSCS Uenv Writing Documentation](https://eth-cscs.github.io/alps-uenv/)
* [CSCS Status Page](https://status.cscs.ch/)
* [CSCS Spack Base Containers](https://github.com/orgs/eth-cscs/packages/container/package/docker-ci-ext%2Fspack-base-containers%2Fspack-build)
* [Sirius CI/CD](https://github.com/electronic-structure/SIRIUS/tree/develop/ci) where this one is based upon
52 changes: 52 additions & 0 deletions .ci/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/bin/bash

set -xeuo pipefail

export SPACK_SYSTEM_CONFIG_PATH="/user-environment/config"
export SPACK_PYTHON="$(which python3.6)" # must be <=3.12, system python is 3.6
export CICD_SRC_DIR="$PWD"
export QUDA_SRC_DIR="$PWD/deps/src/quda"

# QUDA git, branch and commit
export QUDA_GIT_REPO="${QUDA_GIT_REPO:=https://github.com/lattice/quda.git}"
export QUDA_GIT_BRANCH="${QUDA_GIT_BRANCH:=develop}"
export QUDA_GIT_COMMIT="${QUDA_GIT_COMMIT:=$(git ls-remote ${QUDA_GIT_REPO} refs/heads/${QUDA_GIT_BRANCH} | awk '{print $1}')}"

# obtain QUDA
git clone -b "${QUDA_GIT_BRANCH}" "${QUDA_GIT_REPO}" "${QUDA_SRC_DIR}"
git -C "${QUDA_SRC_DIR}" checkout "${QUDA_GIT_COMMIT}"

# make sure we keep the stage direcorty
spack config --scope=spack add config:build_stage:/dev/shm/spack-stage
# we might need to install dependencies too, e.g. nlcglib in case of API changes
spack config --scope=spack add config:install_tree:root:/dev/shm/spack-stage

spack env create -d ./spack-env

# add local repository with current tmlqcd recipe
spack -e ./spack-env repo add "${REPO}"

spack -e ./spack-env config add "packages:all:variants:[${VARIANTS}]"

spack -e ./spack-env add "${SPEC}"

# for tmlqcd use local src instead of fetch git
spack -e ./spack-env develop -p "${CICD_SRC_DIR}" tmlqcd@cicd

# for quda use local src instead of fetch git, to be able to tests against
# differnt repo, branch, commit and also to support that quda branch develop is
# a moving target
spack -e ./spack-env develop -p "${QUDA_SRC_DIR}" quda@cicd

# display spack.yaml
cat ./spack-env/spack.yaml

spack -e ./spack-env concretize
spack -e ./spack-env install

# the tar pipe below expects a relative path
builddir_tmlqcd=$(spack -e ./spack-env location -b tmlqcd)
builddir_quda=$(spack -e ./spack-env location -b quda)

# create a symlink to spack build directory (keep in artifacts)
tar -cf builddir.tar $builddir_tmlqcd $builddir_quda
52 changes: 52 additions & 0 deletions .ci/cscs_beverin_pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
include:
- remote: 'https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml'
- local: '/.ci/include/cscs/00-variables.yml'
- local: '/.ci/include/cscs/01-build-templates.yml'
- local: '/.ci/include/cscs/02-test-templates.yml'


stages:
- prepare
- build
- test


build-base/uenv/beverin-mi300:
stage: prepare
extends: [.uenv-builder-beverin-mi300, .beverin-mi300-secrets]
variables:
UENV_RECIPE: .ci/uenv-recipes/tmlqcd/beverin-mi300
SLURM_TIMELIMIT: "08:00:00"


build-tmlqcd/uenv/beverin-mi300:
extends: [.uenv-runner-beverin-mi300, .build/base, .beverin-mi300-secrets]
needs: [build-base/uenv/beverin-mi300]
variables:
SPEC: "tmlqcd@cicd +lemon +quda ^quda@cicd +qdp +multigrid +twisted_clover +twisted_mass"
REPO: "./.ci/uenv-recipes/tmlqcd/beverin-mi300/repo/"
VARIANTS: "amdgpu_target=gfx942,amdgpu_target_sram_ecc=gfx942,+rocm,+mpi"
SLURM_TIMELIMIT: "01:00:00"


test/beverin-mi300:
extends: [.uenv-runner-beverin-mi300, .test/base, .beverin-mi300-secrets]
needs: [build-tmlqcd/uenv/beverin-mi300]
variables:
REFPATH: "doc/sample-output/hmc-quda-cscs"
QUDA_ENABLE_TUNING: 0 # disable tuning
QUDA_ENABLE_P2P: 0 # disable P2P
SLURM_JOB_NUM_NODES: 2
SLURM_NTASKS: 8
SLURM_TIMELIMIT: "01:00:00"
script:
- hmc_tm -f doc/sample-input/sample-hmc-quda-cscs-beverin.input
- |
if test "${SLURM_PROCID}" -eq "0"; then
echo "Check the results on SLURM_PROCID=${SLURM_PROCID} ..."
numdiff -r 1e-5 -X 1:22 -X 1:5-21 -X 2:22 -X 2:5-21 output.data ${REFPATH}/output.data
for i in $(seq 0 2 18); do
f=onlinemeas.$(printf %06d $i);
numdiff -r 5e-4 ${f} ${REFPATH}/${f};
done
fi
41 changes: 34 additions & 7 deletions .ci/cscs_default_pipeline.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,52 @@
include:
- remote: 'https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml'
- local: '/.ci/include/cscs/00-variables.yml'
- local: '/.ci/include/cscs/01-test-templates.yml'
- local: '/.ci/include/cscs/01-build-templates.yml'
- local: '/.ci/include/cscs/02-test-templates.yml'


stages:
- prepare
- build
- test

build-quda/uenv/daint-gh200:
stage: build

build-base/uenv/daint-gh200:
stage: prepare
extends: .uenv-builder-daint-gh200
variables:
UENV_RECIPE: .ci/uenv-recipes/tmlqcd/daint-gh200
SLURM_TIMELIMIT: "04:00:00"


build-tmlqcd/uenv/daint-gh200:
extends: [.uenv-runner-daint-gh200, .build/base]
needs: [build-base/uenv/daint-gh200]
variables:
SPEC: "tmlqcd@cicd +lemon +quda ^quda@cicd +qdp +multigrid +twisted_clover +twisted_mass"
REPO: "./.ci/uenv-recipes/tmlqcd/daint-gh200/repo/"
VARIANTS: "cuda_arch=90,+cuda,+mpi"
SLURM_TIMELIMIT: "01:00:00"


test/daint-gh200:
extends: .test/hmc
extends: [.uenv-runner-daint-gh200, .test/base]
needs: [build-tmlqcd/uenv/daint-gh200]
variables:
INPUT_FILE: "doc/sample-input/sample-hmc-quda-cscs.input"
REFPATH: "doc/sample-output/hmc-quda-cscs"
QUDA_ENABLE_TUNING: 0 # disable tuning
QUDA_ENABLE_GDR: 1 # enable GPU-Direct RDMA
QUDA_ENABLE_GDR: 0 # enable GPU-Direct RDMA
SLURM_JOB_NUM_NODES: 2
SLURM_NTASKS: 8
SLURM_TIMELIMIT: "00:30:00"
SLURM_TIMELIMIT: "01:00:00"
script:
- hmc_tm -f doc/sample-input/sample-hmc-quda-cscs.input
- |
if test "${SLURM_PROCID}" -eq "0"; then
echo "Check the results on SLURM_PROCID=${SLURM_PROCID} ..."
numdiff -r 1e-5 -X 1:22 -X 1:5-21 -X 2:22 -X 2:5-21 output.data ${REFPATH}/output.data
for i in $(seq 0 2 18); do
f=onlinemeas.$(printf %06d $i);
numdiff -r 5e-4 ${f} ${REFPATH}/${f};
done
fi
9 changes: 7 additions & 2 deletions .ci/include/cscs/00-variables.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,10 @@
variables:
UENV_NAME: tmlqcd
UENV_VERSION: experimental
UENV_TAG: v0.0.7
UENV_VERSION: v1
UENV_TAG: v0.0.8

# These are the firecrest id and secret for the beverin pipeline
.beverin-mi300-secrets:
variables:
F7T_CLIENT_ID: $F7T_TDS_CONSUMER_KEY
F7T_CLIENT_SECRET: $F7T_TDS_CONSUMER_SECRET
17 changes: 17 additions & 0 deletions .ci/include/cscs/01-build-templates.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
include:
- remote: 'https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml'


.build/base:
stage: build
image: ${UENV_NAME}/${UENV_VERSION}:${UENV_TAG}
artifacts:
paths:
- builddir.tar
variables:
SLURM_TIMELIMIT: "01:00:00"
script:
- git clone --filter=tree:0 $(jq -r .spack.repo /user-environment/meta/configure.json) /dev/shm/spack-clone
- git -C /dev/shm/spack-clone checkout $(jq -r .spack.commit /user-environment/meta/configure.json)
- source /dev/shm/spack-clone/share/spack/setup-env.sh
- bwrap --dev-bind / / --tmpfs ~ -- ./.ci/build.sh
53 changes: 0 additions & 53 deletions .ci/include/cscs/01-test-templates.yml

This file was deleted.

20 changes: 20 additions & 0 deletions .ci/include/cscs/02-test-templates.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
include:
- remote: 'https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml'


.test/base:
stage: test
image: ${UENV_NAME}/${UENV_VERSION}:${UENV_TAG}
variables:
WITH_UENV_VIEW: "default"
before_script:
- |
if test "${SLURM_LOCALID}" -eq "0"; then
tar xf ./builddir.tar -C /
touch preparation-done-${CI_JOB_ID}
fi
- while test ! -f preparation-done-${CI_JOB_ID}; do sleep 5; done
- bindir=$(echo /dev/shm/spack-stage/*/spack-stage-tmlqcd-cicd-*/spack-build-*/src/bin)
- libdir=$(dirname $(echo /dev/shm/spack-stage/*/spack-stage-quda-cicd-*/spack-build-*/lib/libquda.so))
- export PATH=:${bindir}:$PATH
- export LD_LIBRARY_PATH=:${libdir}:$LD_LIBRARY_PATH
29 changes: 29 additions & 0 deletions .ci/spack_packages/lemonio/package.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Copyright Spack Project Developers. See COPYRIGHT file for details.
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT)

from spack_repo.builtin.build_systems import cmake
from spack_repo.builtin.build_systems.cmake import CMakePackage, generator


from spack.package import *

class Lemonio(CMakePackage):
"""LEMON: Lightweight Parallel I/O library for Lattice QCD."""

homepage = "https://github.com/etmc/lemon"
git = "https://github.com/etmc/lemon.git"
license("GPL-3.0-or-later")

version('master', branch='master')

depends_on("c", type="build")
depends_on("cxx", type="build")
depends_on("fortran", type="build")

depends_on('mpi')
generator("ninja")

def configure_args(self):
args = []
return args
Loading
Loading