Protein melting temperature (Tm) prediction using protein language models. Given a protein sequence (and 3D structures for the third strategy), the models predict how thermally stable it is.
Three prediction strategies are available:
| Strategy | Model | Notes |
|---|---|---|
| Baseline MLP | ESM2 (650M) + MLP | Fastest to run |
| ESM2-LoRA | ESM2 fine-tuned with LoRA | Best sequence-only accuracy |
| ESM3-MLP | ESM3 (98B) + MLP | Requires free API token and 3D protein structures (PDB) |
Figure 1: TmProt development workflow. The training dataset, ProMelt (comprising 45,441 proteins), was assembled by merging and length-filtering two proteomics-based sources: Meltome Atlas (n=41,916) and ProThermDB (n=3,525). The resulting dataset was split at 25% maximum sequence identity into training, validation, and test subsets. Three modeling strategies of increasing complexity were evaluated:
- Baseline MLP uses frozen ESM-2 embeddings as input features to an MLP regressor with two hidden layers of 64 and 32 neurons, respectively
- ESM2-LoRA applies Low-Rank Adaptation (LoRA) fine-tuning directly to ESM-2 transformer blocks for end-to-end thermostability regression, where ΔW denotes the trainable low-rank weight updates while the original ESM-2 weights remain frozen
- ESM3-MLP generates structural embeddings from ESM-3 using protein structures retrieved from AlphaFoldDB as input to an MLP regressor with the same architecture as Baseline MLP
The currently deployed version of TmProt 1.0 is based on the ESM2-LoRA strategy.
The TmProt web server is available at: https://loschmidt.chemi.muni.cz/tmprot
Use the web interface for quick predictions without setting up the local environment.
The production ESM2-LoRA model (TmProt 1.0) is bundled as a standalone CLI tool. If you only want to predict thermostability, install this and skip Steps 1 and beyond. If you want to run the full training and evaluation pipeline on your own data or reproduce our results, start with Step 1.
cd tmprot-1.0
pip install -e .Predict thermostability for a FASTA file:
tmprot --input proteins.fasta --outdir predictions/ --threshold 60.0Arguments:
--input, -i— FASTA file with protein sequences (required)--outdir, -o— Output directory for CSV results (optional)--threshold, -t— Thermostability threshold in °C (default: 60.0)--delimiter, -d— CSV delimiter (default: tab)
Example output (predictions/proteins.csv):
Rank ID Predicted Tm [°C] Thermostable
1 protein_A 65.5 Yes
2 protein_B 54.2 No
For more details, see the tmprot-1.0 package README.
Don't want to set up anything? Use the web server instead.
Follow the steps below to reproduce the full TmProt training pipeline or train your own models.
Option A: Conda (recommended for development)
git clone git@github.com:loschmidt/TmProt.git
cd TmProt
conda env create -f environment.yaml
conda activate tmprotOption B: Pip
git clone git@github.com:loschmidt/TmProt.git
cd TmProt
python -m venv venv
source venv/bin/activate
pip install -e .Data is hosted on Zenodo. Run the setup script to download it:
# Download everything (recommended for first run)
python scripts/setup_data.py
# Or download only specific datasets
python scripts/setup_data.py --datasets brenda fireprot ered_wt ered_asr cas hldData will be placed in data/ automatically.
Zenodo record: https://zenodo.org/records/20067528
Copy the example file:
cp .env.example .env- Running Baseline MLP or ESM2-LoRA? No changes needed — leave
.envas is. - Running ESM3-MLP? Add your ESM3 API token (free, see ESM3 Setup below).
- Want experiment tracking? Fill in the DagsHub credentials for MLflow logging.
python src/scripts/run_strategies.py --strategies all# Baseline MLP — ESM2 embeddings + sklearn MLP
python src/scripts/run_strategies.py --strategies baseline_mlp
# ESM2-LoRA — fine-tuned ESM2 with LoRA adapters
python src/scripts/run_strategies.py --strategies esm2_lora
# ESM3-MLP — ESM3 embeddings + sklearn MLP (requires API token, see below)
python src/scripts/run_strategies.py --strategies esm3_mlpResults are saved to models/{strategy_name}/ — one CSV and JSON per evaluation dataset.
python src/scripts/run_strategies.py --strategies all --mlflow
# Requires DagsHub credentials in .envESM3-MLP needs a free API token from EvolutionaryScale to generate protein embeddings.
1. Get a token — visit https://forge.evolutionaryscale.ai/apikeys, sign up, and copy your token.
2. Add it to .env:
ESM3_API_TOKEN=your_token_here3. Run — embeddings are generated automatically on first run:
python src/scripts/run_strategies.py --strategies esm3_mlpRequired: Provide PDB structures
To run the ESM3-MLP strategy, 3D protein structure files are required. Use the included download script to fetch AlphaFold2 structures for all UniProt IDs in the dataset:
# Download all available PDB structures
python scripts/download_pdb_structures.py
# See how many would be downloaded first
python scripts/download_pdb_structures.py --dry-run
# Resume after interruption (skips already-downloaded files)
python scripts/download_pdb_structures.py --resumePDB files are placed in data/pdb_structures/{ProteinID}.pdb automatically. The pipeline then uses them to generate structural embeddings with ESM-3.
Note: Only datasets with UniProt IDs (ProMelt, BRENDA, FireProt) can be auto-downloaded. Custom-ID datasets (CAS, ERED, HLD) fall back to sequence-only mode. For UniProt IDs missing from AlphaFold DB, we recommend OmegaFold to generate the missing structures — it runs locally on GPU without a database dependency and was used in the original TmProt study to cover proteins not found in AlphaFold DB.
After running one or more strategies, generate comparison plots:
python src/eval/ranking.pySaves ROC curves, enrichment plots, and scatter plots to images/.
tmprot-1.0/ # Standalone CLI package (pip install -e .)
src/tmprot/
cli.py # tmprot command entry point
helpers.py # Model loading (ESM2-LoRA)
model/ # Pre-trained LoRA adapter
scripts/
setup_data.py # Download datasets from Zenodo
download_pdb_structures.py # Download AlphaFold2 PDB files for ESM3-MLP
src/
strategies/ # Three prediction strategies (baseline_mlp, esm2_lora, esm3_mlp)
scripts/ # Entry points: run_strategies.py, extract_esm2_embeddings.py
eval/ # Metrics, plotting, ranking
data/ # Data loading and tokenization
models/ # LoRA model loading
training/ # ESM2-LoRA trainer
data/
promelt/ # ProMelt training and test sets
evaluation_sets/ # 6 independent evaluation datasets
pdb_structures/ # PDB files for ESM3-MLP (auto-downloaded)
models/ # Output: saved models, predictions, metrics
images/ # Output: scatter plots, ROC curves, enrichment plots
--run_name NAME— save outputs to a custom directory:models/{NAME}/--mlflow— log params, metrics, and model artifacts to DagsHub
If you use TmProt in your research, please cite:

