Skip to content

jonasargelo/PerturbQA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data distribution for “Contextualizing biological perturbation experiments through language”

This is the official data distribution for PerturbQA. If you find our work interesting, please check out our paper to learn more!

@inproceedings{
    wu2025perturbqa,
    title={Contextualizing biological perturbation experiments through language},
    author={Menghua Wu and Russell Littman and Jacob Levine and Lin Qiu and Tommaso Biancalani and David Richmond and Jan-Christian Huetter},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=5WEpbilssv}
}

Please see the README.md file that comes with the code distribution for details about the individual files.

License

The LLM outputs in the data distribution (summer_outputs.zip, summer_enrichment.zip, llm-nocot.zip, llm-noretrieve.zip) and results tables (results.zip) are licensed under the CC BY 4.0 license.

The knowledge graph entries and gene summaries (kg.zip, gene_summary.zip of the data distribution, respectively) are derived from the following datasets and are governed by the original licenses of these datasets:

  • UniProt: the Universal Protein Knowledgebase in 2023 Nucleic Acids Res. 51:D523–D531 (2023) (link)
    Made available under the terms of the CC BY 4.0 license.
  • Ensembl, Ensembl 2024 Nucleic Acids Res. 2024, 52(D1):D891–D899 PMID: 37953337 10.1093/nar/gkad1049 (link)
    Made available under the terms of the Apache 2.0 license.
  • Gene Ontology, 2024-01-17 release (DOI:10.5281/zenodo.10536401)
    Made available under the terms of the CC BY 4.0 license.
  • CORUM: the comprehensive resource of mammalian protein complexes–2022 Nucleic Acids Research, 51(D1):D539–D545 (link)
    Made available under the terms of the CC BY NC 4.0 license.
  • STRINGDB, Szklarczyk et al. Nucleic acids research 51.D1 (2023): D638-D646 (link)
    Made available under the terms of the CC BY 4.0 license.
  • Reactome, The Reactome Pathway Knowledgebase 2024. Nucleic Acids Research. 2024. doi: 10.1093/nar/gkad1025. (link)
    Made available under the terms of the CC BY 4.0 license.
  • Bioplex, Huttlin et al. (2021) Cell 184(11):3022-3040. doi: 10.1016/j.cell.2021.04.011. (link)

Please note that CORUM is licensed under CC BY NC 4.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%