This repository hosts the official research and computational framework for the Olfactory Language Model (OLM). While digital devices have mastered the replication of sight (RGB pixels) and sound (Audio waves), the digital transmission of scent remains an unmapped frontier. This project bridges chemistry and computer science by treating chemical molecules as tokens in a language, digitizing molecular shapes into continuous multi-dimensional odor vectors to allow computer-synthesized olfactory replication and non-invasive medical breath analysis.
Every distinct scent is produced by specific combinations of airborne chemical volatile organic compounds (VOCs). Our framework models this behavior via two core components:
- Molecular-to-Odor Embedding: Utilizing Graph Neural Networks (GNNs) to read the 2D/3D atomic topologies of chemical compounds and map them into a localized coordinate space.
- The "Digital RGB" for Scents: Standardizing a vector space where discrete values represent baseline olfactory pillars (e.g., floral, musky, pungent, sweet) to guide digital scent-diffusing hardware or clinical diagnosis setups.
- Graph-to-Vector Mapping Architecture: Converts raw chemical SMILES or SDF profiles into molecular graphs, extracting structural features via Graph Isomorphism Networks (GIN).
- Cross-Modal Olfactory Alignment: Employs contrastive learning framework (similar to CLIP) to align chemical structure embeddings with natural language scent descriptors.
- Medical Breath Diagnostics Tooling: Dedicated feature maps trained to spot trace chemical shifts in Volatile Organic Compounds (VOCs) to identify early-stage metabolic or pulmonary diseases.
- Hardware Diffusion Controls: Formulates proportional matrix outputs ready to interface with micro-fluidic piezoelectric cartridges for hardware scent execution.
├── src/
│ ├── graph_pipelines/ # Chemical graph extraction from SMILES/SDF datasets
│ ├── models/ # Olfactory GNNs, Scent Transformers, and Contrastive Aligners
│ ├── embedding_space/ # Odor coordinate generation and vector mapping loops
│ └── hardware_api/ # Digital-to-Analog matrix conversion protocols for scent diffusers
├── data/ # Preprocessing configurations for Dravnieks and GoodScents datasets
├── diagnostics/ # Specialized medical VOC analysis and lung disease signature weights
├── notebooks/ # Odor vector clustering, t-SNE graphs, and chemical maps
├── Literature_Review/ # Team research matrices and BibTeX reference files
└── README.md