Skip to content

research-software-ecosystem/micoreca

Repository files navigation

Microbiome Community Resource Catalogue (MiCoReCa)

The rapid growth of microbiome research has led to the development of numerous bioinformatics tools and databases, but information about them remains fragmented across disparate, often outdated cataloging efforts, hindering resource discovery and utilization.

To address this critical gap, the ELIXIR Microbiome Community collaborates with the Research Software Ecosystem to create MiCoReCa (Microbiome Community Resource Catalogue), a comprehensive, dynamic, open-access catalogue of microbiome-related bioinformatics resources:

The extraction, filtering and curation are done following the workflow below and using the defined keywords in the keywords.yml file:

A workflow diagram illustrating the process for populating the RSEC Atlas, starting with 'Scrapping' resources from Bioconda, WorkflowHub, and Elixir, which yields over 40,000 tools and 1,300 workflows. This pool is reduced by the 'Filtering for Microbiome Resources' step to about 5,000 tools and 600 workflows. After 'Community Curation,' the items are 'Displayed on RSEC Atlas.' A detailed flowchart on the right explains the filtering logic: it sequentially checks if EDAM topics are found, if defined keywords are in the tags, if keywords are in the title, and finally if keywords are in the description. A 'Yes' at any step classifies the item as a 'MiCoCo resource,' while a 'No' at all steps classifies it as 'Not a MiCoCo resource.'

Prepare environment

  • Install virtualenv (if not already there)

    $ python3 -m pip install --user virtualenv
    
  • Create virtual environment

    $ python3 -m venv env
    
  • Activate virtual environment

    $ source env/bin/activate
    
  • Install requirements

    $ python3 -m pip install -r requirements.txt
    

Extract workflows from WorkflowHub

  • Extract all workflows metadata from WorkflowHub as a JSON file

    $ python bin/extract_workflowhub.py \
        extract \
        --all content/workflowhub/workflows_full.json
    
  • Filter workflows based on keywords and EDAM terms

    $ python bin/extract_workflowhub.py \
        filter \
        --all content/workflowhub/workflows_full.json \
        --filtered content/workflowhub/workflows_filtered.json \
        --tsv-filtered content/workflowhub/workflows_filtered.tsv \
        --tags keywords.yml \
        --status content/workflowhub/workflows_status.tsv
    

    As explained in the decision tree above, workflows are filtered first on EDAM terms (topics and operations), then on tags, workflow name and finally description based on the keywords provided in keywords.yml file. Workflows are filtered first on EDAM terms (topics and operations), then on tags, workflow name and finally description based on the keywords provided in "keywords.yml".

Bioconda

  • Extract all bioconda metadata as a JSON file

    mkdir -p ./tmp
    
    # download ZIP file *into tmp/*
    wget -O ./tmp/bioconda-recipes.zip https://codeload.github.com/bioconda/bioconda-recipes/zip/master
    
    # unzip from tmp into tmp/
    unzip ./tmp/bioconda-recipes.zip -d ./tmp/
    
    # remove the ZIP after extraction
    rm ./tmp/bioconda-recipes.zip
    
    # run your Python script
    python bin/collect_bioconda_recipes.py \
        --bioconda-path ./tmp/bioconda-recipes-master/recipes \
        --keywords-file ./keywords.yml \
        --output-file ./content/bioconda_filtered.json
    
    # cleanup
    rm -r ./tmp
    

Run the unit tests locally

PYTHONPATH=bin python -m unittest discover -s bin/tests

Contributing

To contribute to the MiCoReCa Source code:

  1. Fork the repository,
  2. Create a branch and add your changes
  3. Add a unit test for your changes (see unittests for examples).
    Warning: new functions now require a unit test to be merged!
  4. Make a pull request.

The unittest framework will run on your PR. Please fix the tests if required.

Upon review the maintainer will merge your pull request. Automatic tests will run on the dev branch.

About

Microbiome Community Resource Catalogue

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6

Languages