Skip to content

sumeetg23/data-sandbox

Repository files navigation

data-sandbox

  1. targeted-bioactivity-analysis: This Jupyter Notebook performs a bioactivity and drug-likeness analysis on compounds targeting the SARS coronavirus 3C-like proteinase (ChEMBL3927) using RDKit, ChEMBL API, and statistical analysis.

  2. SpatialTranscriptomics-ML: This project implements machine learning models to classify spatial transcriptomics data. The code leverages popular libraries like Scanpy, Squidpy, PyTorch, and PyTorch Geometric for graph learning, aiming to explore and compare deep learning approaches on spatially resolved transcriptomics data.

  3. cross-species-genomics-ml: This Jupyter Notebook implements a DNA sequence classification pipeline using k-mer tokenization and Naive Bayes models, and compares cross-species classification performance between human, chimpanzee, and dog DNA.

  4. tableau-clinical-trials-dashboard: A data visualization that explores clinical trial metrics using data from ClinicalTrials.gov.

  5. llm-ngs-qc-parser: This project implements a pipeline to parse Next-Generation Sequencing (NGS) quality control (QC) reports, specifically focusing on Bioanalyzer PDF outputs. It uses LLMs to extract sample information and quality metrics from PDF files and outputs structured data in JSON and CSV formats.

About

A collection of exploratory data science and machine learning projects across bioinformatics, cheminformatics, and clinical data visualization.

Resources

Stars

Watchers

Forks

Contributors