Skip to content

BBMRI-cz/data-catalogue-pseudonymisation

Repository files navigation

DOI

data-catalogue-pseudonymisation

This is the repository for the pseudonymisation part of the BBMRI.cz data catalog.

Pseudonymisation

Pseudonymizes predictive numbers, collects clinical data and removes unnecessary files before moving the data to SensitiveCloud at ICS-MUNI.

Supported sequencing types

Miseq, New Miseq, MammaPrint

How to run the scripts

Dev environment

Using main.py

  1. Install requirements
pip install -r requiremenents.txt
  1. Run main.py
python main.py -s /path/to/runs/for/pseudonymization -d /path/to/sensitive/cloud/destination 
               -t /path/to/pseudonymisation/tables/folder -l /path/to/libraries 
               -lsc /path/to/sensitive/cloud/libraries"

Using docker-compose

docker compose up -f compose.dev.yml -d --build

Test environment

Folder structure

/seq/NO-BACKUP-SPACE/test/
├── Libraries/ # Required library files for pseudonymisation
├── logs/ # Logs from test runs
└── TRANSFER/ # Input data to be pseudonymized

Running a Test

  1. Copy the run you want to test into */test/TRANSFER/:
cp -a /path/to/original/run/ /seq/NO-BACKUP-SPACE/test/TRANSFER/
  1. Switch to export user and navigate to script folder:
su export
cd ~/data-catalogue-pseudonymisation
  1. Start the pseudonymization script:
docker compose -f compose.test.yml up --build 

Viewing logs

Logs for each run are in the /seq/NO-BACKUP-SPACE/test/logs directory. To view all service logs:

docker compose -f compose.test.yml logs

In production

Using docker-compose

# connect to seq server
su export
cd /home/export/data-catalogue-pseudonymisation
docker compose up -f compose.prod.yml --build -d

Deployment in cron

# connect to seq serve
su export
crontab -e
# setting cron to run every Monday, Wednesday, Friday at 22:00
0 22 * * 1,3,5 /usr/local/bin/docker-compose -f /home/export/data-catalogue-pseudonymisation/compose.prod.yml up -d &>> /home/export/logs/`date +\%Y\%m\%d\%H\%M\%S`.log

Deploying new version in production

su export
cd /home/export/data-catalogue-pseudonymization
git switch main
git pull

The new version shouldthe new version should automatically start in production once the cronjob is run automatically start in production once the cronjob is run.

About

This is the repository for the pseudonymisation part of the BBMRI.cz data catalog

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors