maizecode/docs/MaizeCode-data-management.md at master · warelab/maizecode

MaizeCode Data Management

MaizeCode data are processed locally and released through the CyVerse infrastructure. Automated workflows will be constructed to process MaizeCode sequencing data with a CyVerse CSHL federation system (the SciApps system). The same workflows, once mature, will be released for the community to leverage XSEDE computing resources and CyVerse Data Store.

Data flow

A local copy of MaizeCode data will be kept on the SciApps system to support re-analysis and workshops during the project period. Raw data will be submitted to sequence repositories with CyVerse tools/pipelines. Important results will be hosted in CyVerse Data Store, released through MaizeCode.org web portal, and further curated for releasing through CyVerse Data Commons.

SciApps system

SciApps is a cloud based platform for building, executing, & sharing scientific applications (Apps) and workflows, powered by a local CyVerse system at Cold Spring Harbor Laboratory. It uses Agave API to virtualize local compute and storage servers, register metadata, and build modular bioinformatics applications. Using these modular apps, automated workflows are constructed on SciApps to improve both efficiency and reproducibility.

A cluster of six servers with 60 cores and 100 TB storage
Modular apps are optimized for local compute and data
Data are kept locally (in CSHL) during the entire analysis cycle
Workflows capture metadata, relationship, and history of MaizeCode data
Apps are transferable among clouds by utilizing the Singularity containers
Workflows are reproducible with a light-weight JSON file

Metadata

A combination of Google forms, CyVerse metadata templates, and MaizeCode metadata JSON templates will be used to support collecting, management, display, and searching MaizeCode metadata.

Google forms are used to collect metadata
Agave metadata schemas (in JSON) are created to match the Google forms
Metadata are converted from Google form to Agave
Metadata are displayed on SciApps workflow diagram

Community Access

Once mature, all MaizeCode data, apps and workflows will be made public for the community to access through the CyVerse infrastructure. For analysis submitted by community users, data will be stored in CyVerse Data Store and processed with XSEDE computing resources.

Annotation with XSEDE JetStream
ENCODE assays with XSEDE HPC cluster Stampede2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MaizeCode Data Management

Data flow

SciApps system

Metadata

Community Access

FilesExpand file tree

MaizeCode-data-management.md

Latest commit

History

MaizeCode-data-management.md

File metadata and controls

MaizeCode Data Management

Data flow

SciApps system

Metadata

Community Access