Implementation of Llama & Mamba Model

This repo is mainly adopted from https://github.com/som-shahlab/hf_ehr/blob/main/README.md. The original paper is Context Clues paper.

Step 1. Installation:
------------------------

Direct install:
```bash
pip install hf-ehr

For faster Mamba runs, install:

pip install mamba-ssm causal-conv1d

Development install:

conda create -n hf_env python=3.10 -y
conda activate hf_env
pip install -r requirements.txt --no-cache-dir
pip install -e .

Step 2. Pretrain Llama & Mamba Model

The pretraining consists of three parts: Dataset preparation, tokenizer creation and model training.

The customized EHR dataset should be converted to either MEDS data standard or FEMR package.

For tokenizer creation, please see https://github.com/som-shahlab/hf_ehr/blob/main/hf_ehr/tokenizers/README.md in details. An example for using the cookbook tokenizer is:

cd hf_ehr/scripts/
python -m -u hf_ehr.tokenizers.create_cookbook  --dataset MEDSDataset --path_to_dataset_config .../hf_ehr/configs/data/meds_mimic4.yaml --path_to_tokenizer_config .../hf_ehr/configs/tokenizer/cookbook.yaml --n_procs 64 --chunk_size 10000 --is_force_refresh

You need to specify the path to preprocessed dataset, path to yaml file of tokenizer etc. You can change .yaml file to determine the path for storing tokenizer vocabulary file

Then, you can launch a Llama run on the preprocessed dataset and tokenizer (using run.py):

cd hf_ehr/scripts/
python3 hf_ehr.scripts.run \
    +data=meds_mimic4 \
    +trainer=multi_gpu_4 \
    +model=llama-base \
    +tokenizer=cookbook_k \
    data.dataloader.mode=approx \
    data.dataloader.approx_batch_sampler.max_tokens=16384 \
    data.dataloader.max_length=8192 \
    trainer.devices=[0,1,2,3] \
    logging.wandb.name=mimic4-llama-run \
    main.is_force_restart=True \
    main.path_to_output_dir= hf_ehr/cache/runs/llama_8k

Step 3. Extract Patient Representations using Llama & mamba
------------------------

We divide our tasks into two parts: phenotype tasks and patient outcome tasks. You can cutomized your tasks to run inside two .sh files and run them with commands:

```bash
hf_ehr.fine_tune_cumc.outcome.sh \
    --$model_type
    --$model_checkpoint_ path
    --$input_meds
    --$device

hf_ehr.fine_tune_cumc.phenotype.sh \
    --$model_type
    --$model_checkpoint_ path
    --$input_meds
    --$device

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
hf_ehr		hf_ehr
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Implementation of Llama & Mamba Model

Step 2. Pretrain Llama & Mamba Model

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

reAIM-Lab/hf_ehr

Folders and files

Latest commit

History

Repository files navigation

Implementation of Llama & Mamba Model

Step 2. Pretrain Llama & Mamba Model

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages