The pipeline has been designed in close collaboration with TACC. It is not expected to be useable on another platform without substantial effort.
tldr; most of the work happens in folders with the suffix "_app", specifically the bash scripts that are called either runner-template.sh or runner.sh (e.g., runner-template.sh).
Several files in this repository are not related to processing but instead configuration files necessary for computational environment. If you would like to know more about how the repository is structured, please review
The "apps" comprise a distinct module of analysis (e.g., a pipeline like mriqc, conversion to BIDS, aggregation of files). Apps are triggered by "actors".
The apps are each a bash script that is defined by several parameters (e.g., the mriqc app uses BIDS_DIRECTORY, which defines the location of the bids directory that will be passed to the mriqc container). The parameters are set for each job, through a JSON configuration file (e.g., job.json). Tapis interprets the app structure and uses the job script to both fill in the parameters in the runners and embed the runners in a script suitable for scheduling on a cluster (e.g., TACC uses SLURM, and so the resulting script has the #SBATCH lines filled). So, an app receives a JSON file, uses it to configure a cluster batch script, and then runs that script on the cluster.
The JSON files are created by the actors. The parameters that will be filled are set in YAML files (e.g., mriqc_actor/config.yml), which are processed by the reactor.py scripts (e.g., reactor.py). After creating that job, the actor monitors the job's status, and if it is successful it may trigger another actor.
Note that the Tapis apps in this repository are analogous to a BIDS App, but there is not a one-to-one correspondance. For example, in this repository, the QSIprep, MRIQC, and fMRIPrep apps are essentially light wrappers around an individual BIDS App, wrappers that define a particular way of calling the containers. However, there is no official CAT12 BIDS App for this repo's CAT12 app to use. Similarly, there is no HeuDiConv BIDS App because HeuDiConv is designed for getting data into a BIDS format in the first place.
Scans are processed on a per-subject and per-app workflow.
A2CPS
[...]
├── products # files generated by the pipeline
│ ├── consortium-data # beyond mri, A2CPS generates several kinds of products
│ ├── development
│ ├── dirc-data
│ ├── mris # outputs of mri_imaging_pipeline live mainly under A2CPS/products/mris
├── all_sites # results that aggregate across several sites
│ ├── bids
│ ├── cat12
│ ├── fcn
[...]
├── NS_northshore # most products are stored in a subdirectory associated with the pipeline
│ ├── aa-fmri-phantom-qa
│ ├── bids
├── NS10042V1 # pipelines process individual sessions
└── NS10047V1 # each pipeline subdirectory (e.g., cat12, dicoms, fmriprep) is filled with
└── NS10047V3 # a similar collection of products associated with individual sessions
│ ├── cat12
│ ├── dicoms
├── NS10042V1.zip # note that the zip files have names that match the outputs of other pipelines
└── NS10047V1.zip
└── NS10047V3.zip
[...]
├── submissions # original data, restricted access! sites upload scans into their assigned folder.
│ ├── a2cps_testdata # files under /submissions are not touched by us. Any modifications are (e.g., editing DICOM header) done by site
│ ├── a2dtn01
│ ├── NS_northshore
│ ├── NS_northshore_EHR
[...]
After a scan has been completed, sites export the DICOMs from the scanner1, zip2 them3, and then upload the scan4 to TACC5. The existence of new uploads are monitored with a cronjob (see cronjob.sh). When the cronjob detects a new upload, it submits a job to the dicom_reader_actor, which in turn triggers the dicom_reader_app to make a copy of DICOMs6.
After a successful copy, the dicom_reader_actor then triggers the HeuDiConv actor/app to convert the DICOM files into BIDS7. This app not only runs the heudiconv function, but then also takes several steps to clean and check the outputs of that function8.
If the conversion is successful, then the scan will be in a BIDS format and the HeuDiConv actor will trigger more Tapis actors. Currently, these include MRIQC9, fMRIPrep, QSIprep, and CAT12.
Outputs are aggregated weekly with the aggregator actor/app (scheduled via cronjob). The aggregator_app gathers outputs from products/mris/ and stores them together in products/mris/all_sites. Participants are only aggregated when all possible derivatives have been generated (e.g., a participant will not be aggregated if they have gone through the entire pipeline except CAT12). Analyses typically rely on aggregated outputs (e.g., the aggregated bids output). These aggregated outputs are a superset of releases, typically including more participants and more types of derivates.
QC results are generated by the aggregator_qc actor/app (scheduled via cronjob). This step uses outputs from the aggregregator app, and several others. The primary output of this app is a table of ratings10, one rating per scan11.
| site | sub | ses | scan | rating | source | date | notes |
|---|---|---|---|---|---|---|---|
| NS | 10042 | V1 | CUFF1 | green | auto | ||
| NS | 10042 | V1 | CUFF2 | green | auto | ||
| NS | 10042 | V1 | DWI | green | auto | ||
| NS | 10042 | V1 | REST1 | green | auto | ||
| NS | 10042 | V1 | REST2 | green | auto | ||
| NS | 10042 | V1 | T1w | green | auto | 2022-09-20 | |
| NS | 10047 | V1 | CUFF1 | green | psadil | 2022-09-09 | wrap-around, ghosts-other, uncategorized |
| NS | 10047 | V1 | CUFF2 | green | auto | ||
| NS | 10047 | V1 | DWI | green | auto |
For details about how ratings are assigned, see qaqc-summary.pdf and qa-qc-strategy.pptx.
Several apps/actors are scheduled to run via cron -- either in an admin's crontab or using the cron features of Tapis Actors.
| job | schedule |
|---|---|
| dicom_reader_app | every 10 minutes |
| aggregator_app | weekly on Tuesday |
| imaging_log | nightly at 11pm |
| qc_aggregator_actor | weekly on Tuesday |
| aggregator_phantom | weekly on Tuesday |
| fcn_actor | weekly on Tuesday |
| fslanat_actor | weekly on Tuesday |
| signatures_actor | weekly on Tuesday |
Aspects of the QA/QC pipeline are summarized in several reports, some of which are generated automatically (e.g., daily, via cronjobs) and some of which are curated. An example of an automated report is available here, which is produced by code in this github repo.
Note that although the reports summarize information about the pipeline (e.g., quality of scans received), they also draw on other sources of information. In particular, the research assistants and technologists at the site record information about each visit in REDCap (e.g., whether there were deviations from the protocol, a rating of the anatomical image, information about the task, other notes, etc).
In addition to regular reports, several parts of the pipeline generate automated notifications. The notifications go to a slack channel in a workspace that is dedicated to A2CPS. These notifications are helpful for the following reasons:
- The notifications indicate that scans are being processed, since the triggering of processing is automatic
- The notifications help quickly identify when there is an issue in a scan (e.g., the acquisition parameters were not as expected).
Footnotes
-
On each scanner, the runs are named according to the ReproIn specification. The specification allows for conversion by HeuDiConv into BIDS with a heuristic that is built into the heudiconv package. However, not all sites followed this specification, and so we have needed to use these heuristic modifications. ↩
-
Several of the scanners export 2D DICOM, which means that functional runs can comprise tens of thousands of files. We have found that this many files can "stress" the filesystem, which manifests as either general slowness or even a refusal to create new files. See the wikipedia article on inodes. ↩
-
One site was unable to zip the DICOM files. This kind of lack of standardization should be avoided, because it creates several unexpected headaches. ↩
-
Participants are associated with unique label that follows the pattern <numeric_id>. For example, the scans associated with the first visit of a participant from site NS that has ID 10001 would be associated with the label NS10001V1. These labels are stored in the PatientName DICOM header field and used to name the uploaded zip files. ↩
-
Each site has read and write permissions for just one folder on a secure storage system on TACC, and the sites can access that folder over
ssh. ↩ -
Although most of the apps could run participants in parallel, we group each session into a distinct BIDS dataset and then process those datasets separately. ↩
-
By default HeuDiConv makes a gzipped tar arvhice of each DICOM series. This means that we are storing three copies of the DICOMs (the original data that was submitted by the site, a zipped copy, and the copy stored by HeuDiConv in the BIDS folders). Each of these serve a slightly different purpose, but they take up lots of space (not an issue for our environment). ↩
-
As a few examples: files that do not follow the BIDS standard are removed, the resulting JSON sidecars are compared against reference values for that site (e.g., image dimensions, phase encoding direction, scan length), and the final output is checked with the bids validator. ↩
-
The A2CPS paradigm includes two tasks, and we process these as separate jobs. One of the measures produced by MRIQC (and fMRIPrep) is
fd_perc, which is the proportion of frames in a run that are above a given threshold. This decision was made because the two tasks were expected to produce different amounts of motion and so a different threshold was selected for each of the tasks. MRIQC (and fMRIPrep) does not allow task-specific thresholds, hence the decision to process the jobs separately. Note that, after collecting several hundred scans, there is not yet substantial evidence to support the use to different thresholds. ↩ -
site: location of scan; sub: numeric id associated with participant; ses: session/visit identifier; scan: name of scan in session (encodes run number); rating: quality rating of scan (green, yellow, or red); source: source of the rating (auto: based on automatically derived features; researcher name: that researcher made a judgement call); date: day when the rating was made; notes: standardized notes about the scan11. ↩
-
Most of the notes are derived from the MRIQC visual reports. That report has a method of rating scans and provides a list of common scanner artifacts. The notes in the rating table are derived from that standardized list of artifacts. ↩ ↩2