Add ESM2 PEFT recipe by balvisio · Pull Request #1446 · NVIDIA/bionemo-framework

balvisio · 2026-02-04T16:08:56Z

Description

This PR adds a recipe to perform LoRA fine-tuning to the Evo2 model. It provides support for DDP and sequence packing. It also contains a file infer.py that shows how to do inference from a fine-tuned checkpoint. The PR contains the datasets that were used to train the models. Eventually we can convert them into HF datasets.

It does not add support for FSDP or FP8 yet.

Usage

cd bionemo-recipes/recipes/evo2_peft_te/
python train_lora_ddp.py

For more information on usage see the README.

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

ciflow:skip - Skip all CI tests for this PR
ciflow:notebooks - Run Jupyter notebooks execution tests for bionemo2
ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2
ciflow:all - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2.
ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

@coderabbitai review - Triggers a standard review
@coderabbitai full review - Triggers a comprehensive review

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

Summary by CodeRabbit

Release Notes

New Features

Added Parameter-Efficient Fine-Tuning (LoRA) support for ESM2 models, enabling efficient adaptation with reduced memory footprint
Introduced convolutional token classification head as an alternative architecture for sequence labeling tasks
New ESM2 PEFT recipe with complete training and inference pipelines, including distributed training support

Documentation

Added comprehensive guides for training, inference, datasets, and Docker deployment
Provided Hydra configuration examples for quick-start scenarios

Chores

Updated PEFT dependency to specialized version for enhanced functionality
Enhanced distributed training infrastructure with performance monitoring and logging

Signed-off-by: Bruno Alvisio <balvisio@nvidia.com>

coderabbitai · 2026-02-04T16:09:04Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✅ Full review completed - (🔄 Check again to review again)

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev/ba/esm2-peft-recipe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents

In `@bionemo-recipes/models/esm2/pyproject.toml`:
- Line 17: The git-based peft dependency string in pyproject.toml currently pins
to a branch ("peft @
git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora"); replace the
branch ref with an immutable identifier (a specific commit SHA or an official
release tag) so the requirement becomes pinned to that commit/tag, update the
dependency entry accordingly, and verify the chosen SHA/tag exists in the peft
repo and builds correctly (look for the dependency line containing "peft @
git+https://github.com/balvisio/peft.git@...").

In `@bionemo-recipes/recipes/esm2_peft_te/checkpoint.py`:
- Line 1: The current checkpoint.py is a cross-recipe symlink to another
recipe's checkpoint logic; replace it by copying the checkpoint implementation
into this recipe (or extract the shared logic into a new common module and
import from there), add the proper per-file license header to the copied file,
remove any imports that reference other recipes, and update all references to
use the local checkpoint implementation (i.e., the functions/classes originally
provided by the external checkpoint module) so the recipe is fully
self-contained.

In `@bionemo-recipes/recipes/esm2_peft_te/dataset.py`:
- Around line 20-28: The file imports DataCollatorWithFlattening from
transformers but this recipe expects the local, recipe-specific implementation
in collator.py; change the import to pull DataCollatorWithFlattening from the
local collator module (the same place TokenPackingDataset is imported from) so
that the DataCollatorWithFlattening used by Dataset code matches the
recipe-specific signature and Flash Attention / THD-format parameters; update
the import list to reference collator.DataCollatorWithFlattening instead of
transformers.DataCollatorWithFlattening.

In `@bionemo-recipes/recipes/esm2_peft_te/distributed_config.py`:
- Line 1: The file distributed_config.py currently points to
../esm2_native_te/distributed_config.py (cross-recipe symlink); replace this
pointer by inlining the distributed_config implementation into this recipe (copy
the code from esm2_native_te/distributed_config.py into this recipe's
distributed_config.py) and add the required per-file license header at the top,
or alternatively extract any truly shared utilities into a new common module
outside recipes (e.g., a shared package) and import that instead; ensure the new
distributed_config.py in this recipe contains no imports from other recipes and
includes the proper license header.

In `@bionemo-recipes/recipes/esm2_peft_te/Dockerfile`:
- Around line 1-12: The Dockerfile is copying files from esm2_native_te
(checkpoint.py, collator.py, distributed_config.py, scheduler.py) which violates
the self-contained recipe rule; remove those COPY lines and either vendor those
helper modules into esm2_peft_te (add them under esm2_peft_te/ and update any
imports) or move them into a shared non-recipe package, then update the
Dockerfile to only COPY esm2_peft_te/ and its requirements; ensure any import
paths in the code reference the vendored modules (e.g., esm2_peft_te.checkpoint,
esm2_peft_te.collator, etc.) so the image no longer depends on esm2_native_te.

In `@bionemo-recipes/recipes/esm2_peft_te/infer.py`:
- Around line 27-36: Add a Google-style docstring to the _batched_inference
function: document the function purpose in one line, then an Args section
listing model, tokenizer, records, batch_size (int), max_seq_length (int),
stride (int), infer_overflowing_aas (bool), and device (str) with short
descriptions and types, and a Returns section describing the tuple of list[str]
(predicted sequences) and list[int] (corresponding lengths/ids) following
pydocstyle conventions; place the docstring immediately below the def
_batched_inference(...) signature and ensure proper triple-quote formatting and
punctuation.
- Around line 46-76: The code assumes inputs contains
"overflow_to_sample_mapping" but when tokenizer was called with
return_overflowing_tokens/infer_overflowing_aas=False that key is missing;
before using overflow_map (variable overflow_map) in the inner loop (inside the
block that constructs sub_inputs and iterates preds) guard access by checking if
"overflow_to_sample_mapping" is in inputs and set overflow_map =
inputs.pop("overflow_to_sample_mapping", None); when overflow_map is None,
compute original_idx using i (outer sample index) directly (or j+k) without
indexing overflow_map, and keep appending to sequences_to_sample_mapping and
predictions as before to avoid KeyError; update references in the loop where
original_idx is assigned (original_idx = i + overflow_map[j + k].item()) to
handle both cases.
- Line 108: Replace the hardcoded tokenizer checkpoint with the model-derived
tokenizer: modify the AutoTokenizer.from_pretrained call (where tokenizer =
AutoTokenizer.from_pretrained("nvidia/esm2_t48_15B_UR50D")) to load from the
runtime/config value (e.g., args.model_tag or a config field tokenizer_name) so
the tokenizer matches the model loaded elsewhere in infer.py; also add
tokenizer_name: ${model_tag} to hydra_config/defaults_infer.yaml so the
tokenizer checkpoint can be overridden via config. Ensure you reference the same
symbol used to load the model (args.model_tag or the config object) when calling
AutoTokenizer.from_pretrained.

In `@bionemo-recipes/recipes/esm2_peft_te/requirements.txt`:
- Around line 2-6: The git-based peft dependency is pinned to a mutable branch;
update the requirement line "peft @
git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora" to reference an
immutable identifier by replacing the branch name with a specific commit SHA or
released tag (e.g., @<commit-sha> or `@vX.Y.Z`) so builds are reproducible and
supply-chain safe; ensure the updated string remains in requirements.txt as
"peft @ git+https://github.com/balvisio/peft.git@<commit-or-tag>".

In `@bionemo-recipes/recipes/esm2_peft_te/scheduler.py`:
- Line 1: This file currently redirects to or imports from
esm2_native_te.scheduler which violates recipe isolation; replace that
cross-recipe dependency by copying the required scheduler logic into this recipe
or implementing a local equivalent. Identify the exported symbols you depend on
(e.g., Scheduler class, create_scheduler or get_scheduler factory function, and
any helper functions like schedule_task or init_scheduler) from the
esm2_native_te implementation, reproduce their behavior locally inside
bionemo-recipes/recipes/esm2_peft_te/scheduler.py, update any local imports to
use the new local implementations, and remove any import lines that reference
esm2_native_te.scheduler so the recipe is fully self-contained.
- Line 1: The file currently contains a bare path string which raises a
SyntaxError; replace it with a valid Python implementation or a re-export
import: remove the literal path and add an import that re-exports the native
scheduler (e.g., import or from import of esm2_native_te.scheduler) or implement
a minimal wrapper function/class matching this package's expected API (e.g.,
functions/classes used elsewhere that reference scheduler.py) so the module can
be imported without error; update symbols to match callers if you implement a
wrapper.

In `@bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py`:
- Line 25: The recipe imports NVEsmForConvTokenClassification from
modeling_esm_te in train_lora_convnet.py which pulls code from another recipe
and breaks self-containment; either vendor the required module into this recipe
(add modeling_esm_te.py with NVEsmForConvTokenClassification implementation
alongside train_lora_convnet.py and update the package/module path) or declare
and install the external esm package from models/esm2 (add it to
requirements.txt and update the Dockerfile to copy/install that package) so that
the import in train_lora_convnet.py resolves without referencing code outside
this recipe.

🟡 Minor comments (15)

bionemo-recipes/recipes/esm2_peft_te/scheduler.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Add license header and Google-style module docstring.

The file is missing the required license header and module docstring. Pre-commit hooks will likely fail.

As per coding guidelines: “Ensure license headers are present in all files…” and “Use Google-style docstrings (pydocstyle).”
.github/workflows/unit-tests-recipes.yml-158-160 (1)
158-160: ⚠️ Potential issue | 🟡 Minor

Avoid hard-coded safe.directory path to prevent CI breakage in forks.

Using a fixed /__w/bionemo-framework/bionemo-framework path can fail when the repo name changes (forks or mirrors), triggering “dubious ownership” errors. Prefer $GITHUB_WORKSPACE so the command is resilient.
🛠️ Proposed fix
-        run: git -c safe.directory=/__w/bionemo-framework/bionemo-framework sparse-checkout add bionemo-recipes/recipes/esm2_native_te
+        run: git -c safe.directory="$GITHUB_WORKSPACE" sparse-checkout add bionemo-recipes/recipes/esm2_native_te
bionemo-recipes/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/config.json-31-42 (1)

31-42: ⚠️ Potential issue | 🟡 Minor

Add a comment explaining the "L" label in label2id.

The label2id mapping includes "L": 2, which is not a standard DSSP secondary structure code. While this label is used consistently throughout the recipe (mapped to the coil class), it's undocumented what "L" represents. Add an inline comment clarifying whether "L" represents "Loop" or another designation, and explain why it's included in the label scheme alongside standard DSSP codes.

bionemo-recipes/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/README.md-1-200 (1)

1-200: ⚠️ Potential issue | 🟡 Minor

Replace placeholder fields before shipping the model card.
The model card is entirely "[More Information Needed]" placeholders. If this accompanies a published checkpoint, please fill in at least core metadata (license, training data, intended use, evaluation) or clearly mark it as a template to avoid shipping incomplete documentation.
bionemo-recipes/recipes/esm2_peft_te/README.md-101-107 (1)
101-107: ⚠️ Potential issue | 🟡 Minor

Use descriptive link text for the esm2_native_te README link.
Line 107 uses "here", which is not descriptive and triggers MD059. Suggest updating the anchor text.
Proposed fix
-For more information see [here](../esm2_native_te/README.md).
+For more information see the [esm2_native_te README](../esm2_native_te/README.md).
bionemo-recipes/models/esm2/src/esm/modeling_esm_te.py-684-707 (1)
684-707: ⚠️ Potential issue | 🟡 Minor

Add Google-style Args/Returns to the new conv head docstrings.
NVConvNetHead and NVEsmForConvTokenClassification docstrings are minimal and missing Args/Returns sections, and the __init__ docstring name is off. Please update them to Google-style to satisfy pydocstyle.
Example docstring updates
 class NVConvNetHead(nn.Module):
-    """Convolution based head for token classification."""
+    """Convolution-based head for token classification.
+
+    Args:
+        config (NVEsmConfig): Model configuration.
+    """
@@
-    def forward(self, features, **kwargs):
-        """Forward pass for the convolutional token classification head."""
+    def forward(self, features, **kwargs):
+        """Forward pass for the convolutional token classification head.
+
+        Args:
+            features (torch.Tensor): Input features of shape (batch, hidden, seq_len).
+            **kwargs: Unused keyword arguments.
+
+        Returns:
+            torch.Tensor: Logits of shape (batch, seq_len, num_labels).
+        """
@@
 class NVEsmForConvTokenClassification(NVEsmPreTrainedModel):
@@
-    def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+    def __init__(self, config):
+        """Initialize NVEsmForConvTokenClassification.
+
+        Args:
+            config (NVEsmConfig): Model configuration.
+        """
As per coding guidelines: Use Google-style docstrings following pydocstyle conventions.
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Two issues in the constructor:

Line 706: Docstring says "Initialize NVEsmForTokenClassification" but this is NVEsmForConvTokenClassification.

Lines 713-714: Calling both init_weights() and post_init() is redundant—post_init() already invokes init_weights() internally (see HuggingFace PreTrainedModel.post_init()). Other classes in this file (e.g., NVEsmForTokenClassification at line 640) only call post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Same issues as in other esm_nv.py files:

Docstring says "Initialize NVEsmForTokenClassification" instead of "Initialize NVEsmForConvTokenClassification".

Redundant init_weights() call before post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Same issues as in esm2_peft_te/example_8m_checkpoint/esm_nv.py:

Docstring says "Initialize NVEsmForTokenClassification" instead of "Initialize NVEsmForConvTokenClassification".

Redundant init_weights() call before post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Same issues as in other esm_nv.py files:

Docstring says "Initialize NVEsmForTokenClassification" instead of "Initialize NVEsmForConvTokenClassification".

Redundant init_weights() call before post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_peft_te/infer.py-104-106 (1)
104-106: ⚠️ Potential issue | 🟡 Minor

Handle CPU-only environments or allow a device override.

The script hardcodes CUDA at line 106, which crashes on CPU-only systems. Additionally, the _batched_inference() call (lines 112-117) doesn't pass the device parameter despite the function supporting it.
🛠️ Suggested device handling
     # Load PEFT adapters on top
     peft_model = PeftModel.from_pretrained(base_model, args.peft_model_config_dir)
-    peft_model = peft_model.to("cuda").eval()
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    peft_model = peft_model.to(device).eval()
@@
-    predictions, sequences_to_sample_mapping = _batched_inference(
-        peft_model,
-        tokenizer,
-        records,
-        **args.inference,
-    )
+    inference_kwargs = dict(args.inference)
+    inference_kwargs.setdefault("device", device)
+    predictions, sequences_to_sample_mapping = _batched_inference(
+        peft_model,
+        tokenizer,
+        records,
+        **inference_kwargs,
+    )
bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py-170-196 (1)
170-196: ⚠️ Potential issue | 🟡 Minor

Guard against empty validation dataloader.

If validation yields zero batches, val_steps stays 0 and the averaging will raise. Add a guard to handle empty validation splits.
💡 Suggested fix
-                avg_val_loss = val_loss_total / val_steps
-                avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
+                if val_steps == 0:
+                    avg_val_loss = 0.0
+                    avg_val_acc = 0.0
+                else:
+                    avg_val_loss = val_loss_total / val_steps
+                    avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
bionemo-recipes/recipes/esm2_peft_te/train_lora_ddp.py-171-197 (1)
171-197: ⚠️ Potential issue | 🟡 Minor

Guard against empty validation dataloader.

If the validation dataloader yields zero batches, val_steps remains 0 and this division will raise. Add a zero-batch guard.
💡 Suggested fix
-                avg_val_loss = val_loss_total / val_steps
-                avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
+                if val_steps == 0:
+                    avg_val_loss = 0.0
+                    avg_val_acc = 0.0
+                else:
+                    avg_val_loss = val_loss_total / val_steps
+                    avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
bionemo-recipes/recipes/esm2_peft_te/utils.py-106-123 (1)
106-123: ⚠️ Potential issue | 🟡 Minor

Validate CSV headers and handle empty files.

The current code fails on empty CSV files (line 115: "pdb_id" in reader.fieldnames raises TypeError when fieldnames is None), and missing the sequence column causes an unhandled KeyError on line 120. Add upfront validation to provide clear error messages.
Suggested fix
 def load_csv(path: Path) -> list[dict]:
     """Read input CSV file for inference.
@@
-    with open(path) as f:
-        reader = csv.DictReader(f)
-        has_pdb_id = "pdb_id" in reader.fieldnames
+    with open(path, newline="") as f:
+        reader = csv.DictReader(f)
+        if reader.fieldnames is None:
+            raise ValueError("CSV must include a header with a 'sequence' column.")
+        if "sequence" not in reader.fieldnames:
+            raise ValueError("CSV header must include a 'sequence' column.")
+        has_pdb_id = "pdb_id" in reader.fieldnames
bionemo-recipes/recipes/esm2_peft_te/tests/test_train_lora.py-16-18 (1)
16-18: ⚠️ Potential issue | 🟡 Minor

Guard test_sanity_ddp_thd with CUDA availability check to prevent failures in CPU-only environments.

torch.cuda.get_device_capability() raises if CUDA is unavailable. Add the skip guard at the start of the function to avoid test failures in CPU-only runs.
💡 Suggested fix
 import torch
+import pytest
 from hydra import compose, initialize_config_dir

 from train_lora_ddp import main as main_ddp


 def test_sanity_ddp_thd(tmp_path, monkeypatch, recipe_path):
+    if not torch.cuda.is_available():
+        pytest.skip("CUDA is required for DDP THD sanity test")
     if torch.cuda.get_device_capability() == (12, 0):
         # TODO(BIONEMO-2840): On sm120, we need to set NVTE_FUSED_ATTN to 0 since TE will choose fused attn by default,

🧹 Nitpick comments (7)

bionemo-recipes/recipes/esm2_peft_te/perf_logger.py (2)
139-142: Avoid mutating the caller's outputs object.

Directly modifying outputs.logits with unsqueeze(0) mutates the object passed by the caller, which could cause unexpected side effects in the training loop if the outputs are used elsewhere after this call.
Proposed fix: use a local variable
         # Handle sequence packing for torchmetrics calculation.
+        logits_for_perplexity = outputs.logits
         if outputs.logits.dim() < 3:
-            outputs.logits = outputs.logits.unsqueeze(0)
+            logits_for_perplexity = outputs.logits.unsqueeze(0)

-        self.metrics["train/perplexity"].update(outputs.logits, batch["labels"])
+        self.metrics["train/perplexity"].update(logits_for_perplexity, batch["labels"])
153-166: Inconsistent rank checks for logging.

Line 153 uses is_main_process() for wandb logging, but line 165 uses local_rank == 0 for logger output. In multi-node distributed setups, these may not be equivalent (e.g., local_rank == 0 is true on every node, while is_main_process() is typically true only on global rank 0).

Consider using consistent rank checks throughout.
Proposed fix
-            if self._dist_config.local_rank == 0:
+            if self._dist_config.is_main_process():
                 logger.info(", ".join([f"{k.split('/')[1]}: {v:.3g}" for k, v in metrics.items()]))
bionemo-recipes/recipes/esm2_peft_te/tests/test_train_lora_two_gpus.py (1)
49-52: Add a Google-style docstring to the test.

Keeps docstring linting consistent even under relaxed test rules.
♻️ Suggested update
 `@requires_multi_gpu`
 def test_multi_gpu_train_te_ddp(tmp_path, recipe_path):
+    """Smoke-test multi-GPU DDP training for the recipe."""
     # Run 'accelerate launch train.py' as a subprocess
As per coding guidelines: Use Google-style docstrings following pydocstyle conventions.
bionemo-recipes/recipes/esm2_peft_te/train_lora_ddp.py (1)
111-112: Gate verbose prints to the main process.

These prints will fire on every rank and can spam logs; consider restricting them to the main process.
💡 Suggested fix
-    print("----- PEFT Model --------")
-    peft_model.print_trainable_parameters()
+    if dist_config.is_main_process():
+        print("----- PEFT Model --------")
+        peft_model.print_trainable_parameters()
@@
-                print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
+                if dist_config.is_main_process():
+                    print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
Also applies to: 198-198
bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py (1)
110-111: Gate verbose prints to the main process.

These prints execute on every rank; restricting to the main process avoids noisy multi-rank output.
💡 Suggested fix
-    print("----- PEFT Model --------")
-    peft_model.print_trainable_parameters()
+    if dist_config.is_main_process():
+        print("----- PEFT Model --------")
+        peft_model.print_trainable_parameters()
@@
-                print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
+                if dist_config.is_main_process():
+                    print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
Also applies to: 197-197
bionemo-recipes/recipes/esm2_peft_te/dataset.py (1)
31-44: Use Google-style docstring for create_dataloader.

The current one-line docstring doesn’t meet the Google-style requirement. Add Args:/Returns: sections.
💡 Suggested fix
-    """Create a dataloader for the secondary structure dataset."""
+    """Create dataloaders for the secondary structure dataset.
+
+    Args:
+        distributed_config: Distributed training configuration.
+        use_sequence_packing: Whether to enable sequence packing.
+        tokenizer_name: Tokenizer identifier.
+        micro_batch_size: Training micro-batch size.
+        val_micro_batch_size: Validation micro-batch size.
+        num_workers: DataLoader worker count.
+        max_seq_length: Maximum sequence length.
+        stride: Tokenizer stride for overflow.
+        seed: RNG seed for shuffling.
+        ss3_classification: Whether to use SS3 labels (else SS8).
+        load_dataset_kwargs: Keyword arguments for datasets.load_dataset.
+
+    Returns:
+        Tuple of (train_dataloader, val_dataloader, train_dataset_or_sampler).
+    """
As per coding guidelines, Ensure all Python files follow Google-style docstrings (pydocstyle convention).
bionemo-recipes/recipes/esm2_peft_te/utils.py (1)
55-61: Use Google-style docstrings for public helpers.

These public functions should include Args:/Returns: sections to meet the docstring standard. Consider updating all helper docstrings similarly.
💡 Example for one function
 def compute_accuracy(preds, labels, ignore_index=-100) -> tuple[int, int]:
-    """Calculate the accuracy."""
+    """Calculate the accuracy.
+
+    Args:
+        preds: Model logits or scores per token.
+        labels: Ground-truth label tensor.
+        ignore_index: Label value to exclude from accuracy.
+
+    Returns:
+        Tuple of (correct_count, total_count).
+    """
As per coding guidelines, Ensure all Python files follow Google-style docstrings (pydocstyle convention).
Also applies to: 64-82, 84-104, 106-135, 138-161, 163-170

coderabbitai · 2026-02-05T16:24:24Z

bionemo-recipes/models/esm2/pyproject.toml

    "megatron-fsdp",
    "omegaconf",
-    "peft",
+    "peft @ git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "pyproject.toml" -type f | grep -E "(esm2|bionemo)" | head -20

Repository: NVIDIA/bionemo-framework

Length of output: 840

🏁 Script executed:

cat -n bionemo-recipes/models/esm2/pyproject.toml | head -30

Repository: NVIDIA/bionemo-framework

Length of output: 909

🏁 Script executed:

# Check for requirements.txt files mentioned in the review find . -name "requirements*.txt" -type f | grep -E "(esm2|recipe)" | head -20

Repository: NVIDIA/bionemo-framework

Length of output: 628

🏁 Script executed:

cat -n bionemo-recipes/recipes/esm2_peft_te/requirements.txt | grep -i peft -A 2 -B 2

Repository: NVIDIA/bionemo-framework

Length of output: 215

🏁 Script executed:

# Check other esm2 requirements files too cat -n bionemo-recipes/recipes/esm2_accelerate_te/requirements.txt | grep -i peft -A 2 -B 2

Repository: NVIDIA/bionemo-framework

Length of output: 50

🏁 Script executed:

cat -n bionemo-recipes/recipes/esm2_native_te/requirements.txt | grep -i peft -A 2 -B 2

Repository: NVIDIA/bionemo-framework

Length of output: 50

Pin the git-based peft dependency to an immutable commit/tag.

The peft dependency uses a branch reference (@dev/ba/support-te-lora) instead of a commit SHA or released tag. Branch references are non-immutable and can change unpredictably, creating reproducibility and supply-chain risks. Use a pinned commit SHA or released tag to ensure deterministic builds.

🤖 Prompt for AI Agents

In `@bionemo-recipes/models/esm2/pyproject.toml` at line 17, The git-based peft dependency string in pyproject.toml currently pins to a branch ("peft @ git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora"); replace the branch ref with an immutable identifier (a specific commit SHA or an official release tag) so the requirement becomes pinned to that commit/tag, update the dependency entry accordingly, and verify the chosen SHA/tag exists in the peft repo and builds correctly (look for the dependency line containing "peft @ git+https://github.com/balvisio/peft.git@...").

coderabbitai · 2026-02-05T16:24:24Z

bionemo-recipes/models/esm2/src/esm/modeling_esm_te.py

+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        **kwargs: Unpack[TransformersKwargs],
+    ) -> TokenClassifierOutput:
+        """Forward pass for the token classification head.
+
+        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
+        """
+        outputs = self.esm(
+            input_ids,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            inputs_embeds=inputs_embeds,
+            **kwargs,
+        )
+
+        if outputs[0].dim() == 3:
+            sequence_output = outputs[0]
+        else:
+            sequence_output = outputs[0].unsqueeze(0)
+
+        sequence_output = sequence_output.transpose(1, 2)
+
+        logits = self.classifier(sequence_output)
+
+        loss = None
+        if labels is not None:
+            loss_fct = CrossEntropyLoss()
+
+            labels = labels.to(logits.device)
+            loss = loss_fct(logits.reshape(-1, self.num_labels), labels.view(-1))
+


⚠️ Potential issue | 🟠 Major

Packed/THD sequences will be convolved across boundaries.
When sequence packing is enabled (THD), outputs[0] becomes a flattened token stream. The Conv1d head will mix tokens from adjacent sequences, corrupting logits/loss. Consider rejecting packed inputs for this head or inserting explicit padding between sequences and handling it accordingly.

Possible guard to prevent incorrect packed-input usage

def forward( self, input_ids: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, inputs_embeds: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, **kwargs: Unpack[TransformersKwargs], ) -> TokenClassifierOutput: + if kwargs.get("cu_seq_lens_q") is not None or kwargs.get("cu_seq_lens_k") is not None: + raise ValueError( + "NVEsmForConvTokenClassification does not support sequence-packed (THD) inputs; " + "disable use_sequence_packing." + ) outputs = self.esm( input_ids, attention_mask=attention_mask, position_ids=position_ids,

coderabbitai · 2026-02-05T16:24:24Z

bionemo-recipes/recipes/esm2_peft_te/checkpoint.py

@@ -0,0 +1 @@
+../esm2_native_te/checkpoint.py


⚠️ Potential issue | 🟠 Major

Replace cross-recipe symlink with a local/shared implementation.

Pointing this file at ../esm2_native_te/checkpoint.py makes the recipe depend on another recipe and bypasses per-file license headers. Please copy the checkpoint logic into this recipe, or move shared logic into a non-recipe common module and import from there.
As per coding guidelines, recipes in bionemo-recipes/recipes/ must be completely self-contained with no shared dependencies between recipes and no imports from other recipes, and ensure license headers are present in all **/*.{py,js,ts,jsx,tsx} files.

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/checkpoint.py` at line 1, The current checkpoint.py is a cross-recipe symlink to another recipe's checkpoint logic; replace it by copying the checkpoint implementation into this recipe (or extract the shared logic into a new common module and import from there), add the proper per-file license header to the copied file, remove any imports that reference other recipes, and update all references to use the local checkpoint implementation (i.e., the functions/classes originally provided by the external checkpoint module) so the recipe is fully self-contained.

coderabbitai · 2026-02-05T16:24:24Z

bionemo-recipes/recipes/esm2_peft_te/dataset.py

+from transformers import (
+    AutoTokenizer,
+    DataCollatorForTokenClassification,
+    DataCollatorWithFlattening,
+)
+
+from collator import TokenPackingDataset
+from distributed_config import DistributedConfig
+from utils import SS3_LABEL2ID, SS8_LABEL2ID


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the DataCollatorWithFlattening class definition rg -n "class DataCollatorWithFlattening" -g '*.py' --type py

Repository: NVIDIA/bionemo-framework

Length of output: 409

🏁 Script executed:

#!/bin/bash # Check the structure of the recipes/esm2_peft_te directory find bionemo-recipes/recipes/esm2_peft_te -type f -name "*.py" | head -20

Repository: NVIDIA/bionemo-framework

Length of output: 778

🏁 Script executed:

#!/bin/bash # Read the collator.py file to see what's defined there cat -n bionemo-recipes/recipes/esm2_peft_te/collator.py 2>/dev/null | head -100

Repository: NVIDIA/bionemo-framework

Length of output: 5676

🏁 Script executed:

#!/bin/bash # Check the dataset.py file around the import section cat -n bionemo-recipes/recipes/esm2_peft_te/dataset.py 2>/dev/null | head -50

Repository: NVIDIA/bionemo-framework

Length of output: 2298

🏁 Script executed:

#!/bin/bash # Check if collator.py exists in esm2_peft_te directory ls -la bionemo-recipes/recipes/esm2_peft_te/ | grep collator

Repository: NVIDIA/bionemo-framework

Length of output: 162

🌐 Web query:

DataCollatorWithFlattening transformers library

💡 Result:

DataCollatorWithFlattening is a Hugging Face Transformers data collator for padding-free training. Instead of padding each example to a common length, it concatenates the whole mini-batch into one long sequence of shape [1, total_tokens], and (by default) returns input_ids, labels, and position_ids. It separates each original example inside the concatenated labels using separator_id (default -100, i.e., ignore index for loss). It can also optionally return seq_idx (which original sequence each token came from) and FlashAttention kwargs. [1]

Key params (from the docs): return_position_ids=True, separator_id=-100, return_flash_attn_kwargs=False, return_seq_idx=False. [1]

Caveat: because the batch becomes one long sequence, your attention implementation must be able to handle this “flattened batch” layout (it’s not the usual [batch, seq] padding/masking pattern). [1]

Example:

from transformers import DataCollatorWithFlattening collator = DataCollatorWithFlattening(separator_id=-100, return_seq_idx=True) # pass `collator` as the data_collator to Trainer / DataLoader

Source: Transformers “Data collator” docs (main branch). [1]

References:
[1] https://huggingface.co/docs/transformers/main/en/main_classes/data_collator

🏁 Script executed:

#!/bin/bash # Check if there's a collator.py file in esm2_peft_te by searching more thoroughly find bionemo-recipes/recipes/esm2_peft_te -name "collator.py"

Repository: NVIDIA/bionemo-framework

Length of output: 116

🏁 Script executed:

#!/bin/bash # Check the imports in dataset.py more carefully and see if TokenPackingDataset is also from collator cat -n bionemo-recipes/recipes/esm2_peft_te/dataset.py | head -35

Repository: NVIDIA/bionemo-framework

Length of output: 1567

Import DataCollatorWithFlattening from the local collator module.

A custom DataCollatorWithFlattening is defined in collator.py (via symlink to esm2_native_te/collator.py) with a recipe-specific implementation for THD format and Flash Attention support. The current import from transformers retrieves the official library version instead, which has a different signature and will cause runtime errors when instantiated with the custom parameters expected by this recipe. Import it from the local module to ensure the recipe uses its own self-contained implementation.

Suggested fix

-from transformers import ( - AutoTokenizer, - DataCollatorForTokenClassification, - DataCollatorWithFlattening, -) +from transformers import AutoTokenizer, DataCollatorForTokenClassification @@ -from collator import TokenPackingDataset +from collator import DataCollatorWithFlattening, TokenPackingDataset

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from transformers import (

AutoTokenizer,

DataCollatorForTokenClassification,

DataCollatorWithFlattening,

)

from collator import TokenPackingDataset

from distributed_config import DistributedConfig

from utils import SS3_LABEL2ID, SS8_LABEL2ID

from transformers import AutoTokenizer, DataCollatorForTokenClassification

from collator import DataCollatorWithFlattening, TokenPackingDataset

from distributed_config import DistributedConfig

from utils import SS3_LABEL2ID, SS8_LABEL2ID

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/dataset.py` around lines 20 - 28, The file imports DataCollatorWithFlattening from transformers but this recipe expects the local, recipe-specific implementation in collator.py; change the import to pull DataCollatorWithFlattening from the local collator module (the same place TokenPackingDataset is imported from) so that the DataCollatorWithFlattening used by Dataset code matches the recipe-specific signature and Flash Attention / THD-format parameters; update the import list to reference collator.DataCollatorWithFlattening instead of transformers.DataCollatorWithFlattening.

coderabbitai · 2026-02-05T16:24:25Z

bionemo-recipes/recipes/esm2_peft_te/distributed_config.py

@@ -0,0 +1 @@
+../esm2_native_te/distributed_config.py


⚠️ Potential issue | 🟠 Major

Avoid cross-recipe symlink for distributed config.

This pointer ties the recipe to another recipe’s config and skips the required per-file license header. Please inline the config here or move shared logic into a non-recipe common module.
As per coding guidelines, recipes in bionemo-recipes/recipes/ must be completely self-contained with no shared dependencies between recipes and no imports from other recipes, and ensure license headers are present in all **/*.{py,js,ts,jsx,tsx} files.

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/distributed_config.py` at line 1, The file distributed_config.py currently points to ../esm2_native_te/distributed_config.py (cross-recipe symlink); replace this pointer by inlining the distributed_config implementation into this recipe (copy the code from esm2_native_te/distributed_config.py into this recipe's distributed_config.py) and add the required per-file license header at the top, or alternatively extract any truly shared utilities into a new common module outside recipes (e.g., a shared package) and import that instead; ensure the new distributed_config.py in this recipe contains no imports from other recipes and includes the proper license header.

coderabbitai · 2026-02-05T16:24:25Z

bionemo-recipes/recipes/esm2_peft_te/infer.py

+        inputs = tokenizer(
+            sequences,
+            max_length=max_seq_length,
+            truncation=True,
+            stride=stride,
+            return_overflowing_tokens=infer_overflowing_aas,
+            return_tensors="pt",
+            padding=True,
+        )
+
+        overflow_map = inputs.pop("overflow_to_sample_mapping")
+        num_samples = len(inputs["input_ids"])
+
+        # inner batching over tokenizer outputs
+        for j in range(0, num_samples, batch_size):
+            sub_inputs = {k: v[j : j + batch_size].to(device) for k, v in inputs.items()}
+
+            with torch.no_grad():
+                outputs = model(**sub_inputs)
+
+            preds = outputs.logits.argmax(dim=-1)
+
+            for k, (pred, input_ids) in enumerate(zip(preds, sub_inputs["input_ids"])):
+                length = (input_ids != tokenizer.pad_token_id).sum().item()
+                labels = "".join(id2label[i.item()] for i in pred[:length])
+
+                predictions.append(labels)
+
+                # map back to original record index
+                original_idx = i + overflow_map[j + k].item()
+                sequences_to_sample_mapping.append(original_idx)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "infer.py" -path "*esm2_peft_te*" | head -5

Repository: NVIDIA/bionemo-framework

Length of output: 115

🏁 Script executed:

cat -n bionemo-recipes/recipes/esm2_peft_te/infer.py | head -100

Repository: NVIDIA/bionemo-framework

Length of output: 4496

🌐 Web query:

Hugging Face Transformers tokenizer return_overflowing_tokens overflow_to_sample_mapping behavior

💡 Result:

return_overflowing_tokens=True tells the tokenizer to split inputs longer than max_length into multiple “windows” (optionally overlapping by stride) and return them as additional rows in the output. (huggingface.co)

When that happens, the returned batch is flattened from roughly (batch, overflows, seq_len) into (batch * overflows, seq_len), and overflow_to_sample_mapping is added so you can map each returned window back to the original example index within the input batch. Specifically, it’s built by repeating each original index i exactly (# of windows produced for sample i) times. (huggingface.co)

Important edge case: for sequence pairs with truncation strategy 'longest_first', Transformers warns that overflowing tokens aren’t returned (so overflow-related outputs may be empty/missing even if truncation occurred). (discuss.huggingface.co)

Practical gotcha with 🤗 Datasets: if you tokenize inside Dataset.map(..., num_proc>1), overflow_to_sample_mapping can be wrong because each worker computes indices relative to its shard (not the global dataset) and then results are concatenated. (github.com)

Citations:

1: https://huggingface.co/transformers/v4.3.3/_modules/transformers/tokenization_utils_fast.html?utm_source=openai

2: https://huggingface.co/transformers/v4.3.3/_modules/transformers/tokenization_utils_fast.html?utm_source=openai

3: https://discuss.huggingface.co/t/what-does-this-warning-mean-overflowing-tokens-are-not-returned-for-the-setting-you-have-chosen/11594?utm_source=openai

4: Datasets.map causes incorrect overflow_to_sample_mapping when used with tokenizers and small batch size huggingface/datasets#4755

Guard overflow_to_sample_mapping when overflow is disabled.

When return_overflowing_tokens=False, the tokenizer does not include overflow_to_sample_mapping in its output, causing a KeyError on line 56.

Suggested fix

- overflow_map = inputs.pop("overflow_to_sample_mapping") + if infer_overflowing_aas: + overflow_map = inputs.pop("overflow_to_sample_mapping") + else: + overflow_map = torch.arange(len(inputs["input_ids"]))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

inputs = tokenizer(

sequences,

max_length=max_seq_length,

truncation=True,

stride=stride,

return_overflowing_tokens=infer_overflowing_aas,

return_tensors="pt",

padding=True,

)

overflow_map = inputs.pop("overflow_to_sample_mapping")

num_samples = len(inputs["input_ids"])

# inner batching over tokenizer outputs

for j in range(0, num_samples, batch_size):

sub_inputs = {k: v[j : j + batch_size].to(device) for k, v in inputs.items()}

with torch.no_grad():

outputs = model(**sub_inputs)

preds = outputs.logits.argmax(dim=-1)

for k, (pred, input_ids) in enumerate(zip(preds, sub_inputs["input_ids"])):

length = (input_ids != tokenizer.pad_token_id).sum().item()

labels = "".join(id2label[i.item()] for i in pred[:length])

predictions.append(labels)

# map back to original record index

original_idx = i + overflow_map[j + k].item()

sequences_to_sample_mapping.append(original_idx)

inputs = tokenizer(

sequences,

max_length=max_seq_length,

truncation=True,

stride=stride,

return_overflowing_tokens=infer_overflowing_aas,

return_tensors="pt",

padding=True,

)

if infer_overflowing_aas:

overflow_map = inputs.pop("overflow_to_sample_mapping")

else:

overflow_map = torch.arange(len(inputs["input_ids"]))

num_samples = len(inputs["input_ids"])

# inner batching over tokenizer outputs

for j in range(0, num_samples, batch_size):

sub_inputs = {k: v[j : j + batch_size].to(device) for k, v in inputs.items()}

with torch.no_grad():

outputs = model(**sub_inputs)

preds = outputs.logits.argmax(dim=-1)

for k, (pred, input_ids) in enumerate(zip(preds, sub_inputs["input_ids"])):

length = (input_ids != tokenizer.pad_token_id).sum().item()

labels = "".join(id2label[i.item()] for i in pred[:length])

predictions.append(labels)

# map back to original record index

original_idx = i + overflow_map[j + k].item()

sequences_to_sample_mapping.append(original_idx)

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/infer.py` around lines 46 - 76, The code assumes inputs contains "overflow_to_sample_mapping" but when tokenizer was called with return_overflowing_tokens/infer_overflowing_aas=False that key is missing; before using overflow_map (variable overflow_map) in the inner loop (inside the block that constructs sub_inputs and iterates preds) guard access by checking if "overflow_to_sample_mapping" is in inputs and set overflow_map = inputs.pop("overflow_to_sample_mapping", None); when overflow_map is None, compute original_idx using i (outer sample index) directly (or j+k) without indexing overflow_map, and keep appending to sequences_to_sample_mapping and predictions as before to avoid KeyError; update references in the loop where original_idx is assigned (original_idx = i + overflow_map[j + k].item()) to handle both cases.

coderabbitai · 2026-02-05T16:24:25Z

bionemo-recipes/recipes/esm2_peft_te/infer.py

+    peft_model = PeftModel.from_pretrained(base_model, args.peft_model_config_dir)
+    peft_model = peft_model.to("cuda").eval()
+
+    tokenizer = AutoTokenizer.from_pretrained("nvidia/esm2_t48_15B_UR50D")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and read the infer.py file find . -name "infer.py" -path "*/esm2_peft_te/*" | head -20

Repository: NVIDIA/bionemo-framework

Length of output: 115

🏁 Script executed:

# Get the size to determine how much we can read wc -l bionemo-recipes/recipes/esm2_peft_te/infer.py

Repository: NVIDIA/bionemo-framework

Length of output: 117

🌐 Web query:

Transformers documentation AutoModelForTokenClassification tokenizer selection best practices

💡 Result:

Tokenizer selection best practices for AutoModelForTokenClassification (Transformers)

Always load the tokenizer from the same checkpoint as the model

Use AutoTokenizer.from_pretrained(model_name_or_path) that matches the checkpoint you pass to AutoModelForTokenClassification.from_pretrained(...) so the vocab, normalization, and special tokens align with the model. (This is the default pattern shown throughout the token classification task docs.) [1]

Prefer a fast tokenizer (use_fast=True) for token classification

Token classification typically requires word-to-subword alignment, done via is_split_into_words=True plus word_ids(). The official token classification guide uses word_ids() for label alignment. [1]

If you accidentally end up with a Python (“slow”) tokenizer, word_ids() may be unavailable (common failure mode). [5]

Use the recommended label-alignment strategy

Realign labels after subword tokenization by:

mapping tokens to word indices with word_ids(),

setting labels for special tokens to -100,

labeling only the first subtoken of each word (others -100). [1]

Pad/truncate correctly (usually: dynamic padding + truncation)

For training/inference batches, rely on the standard tokenizer args padding / truncation / max_length as needed. [2]

For token classification training, prefer dynamic padding per batch via DataCollatorForTokenClassification(tokenizer=tokenizer) (recommended in the task guide). [1]

If you need spans (start/end char offsets), request offsets from the tokenizer

Fast tokenizers can return return_offsets_mapping=True to map tokens back to character spans (special tokens typically map to (0, 0)). [3]

RoBERTa / byte-level BPE gotcha when using pre-tokenized words

If you call the tokenizer with is_split_into_words=True on RoBERTa-like tokenizers, instantiate with add_prefix_space=True (explicitly noted in the RoBERTa docs). [4]

Minimal, “safe default” loading snippet

from transformers import AutoTokenizer, AutoModelForTokenClassification ckpt = "your-checkpoint" tokenizer = AutoTokenizer.from_pretrained(ckpt, use_fast=True) model = AutoModelForTokenClassification.from_pretrained(ckpt)

Key takeaway: for token classification, the “best” tokenizer is almost always the matching checkpoint’s fast tokenizer, because alignment utilities like word_ids()/offsets are central to correct training and evaluation. [1] [5]

🏁 Script executed:

cat -n bionemo-recipes/recipes/esm2_peft_te/infer.py

Repository: NVIDIA/bionemo-framework

Length of output: 5556

🏁 Script executed:

# Check the Hydra config structure find bionemo-recipes/recipes/esm2_peft_te -name "*.yaml" | head -10

Repository: NVIDIA/bionemo-framework

Length of output: 467

🏁 Script executed:

# Read the default Hydra config cat bionemo-recipes/recipes/esm2_peft_te/hydra_config/L0_sanity_infer.yaml

Repository: NVIDIA/bionemo-framework

Length of output: 384

🏁 Script executed:

cat bionemo-recipes/recipes/esm2_peft_te/hydra_config/defaults_infer.yaml

Repository: NVIDIA/bionemo-framework

Length of output: 334

🏁 Script executed:

# Also check other training configs to see if tokenizer_name pattern is used elsewhere grep -r "tokenizer_name" bionemo-recipes/recipes/esm2_peft_te/ 2>/dev/null || echo "No tokenizer_name found"

Repository: NVIDIA/bionemo-framework

Length of output: 646

🏁 Script executed:

# Check if there's a pattern in other recipes find bionemo-recipes/recipes -name "*.py" -type f -exec grep -l "tokenizer_name\|AutoTokenizer.from_pretrained" {} \; | head -10

Repository: NVIDIA/bionemo-framework

Length of output: 767

🏁 Script executed:

# Check how dataset.py uses tokenizer_name to understand the pattern sed -n '1,50p' bionemo-recipes/recipes/esm2_peft_te/dataset.py

Repository: NVIDIA/bionemo-framework

Length of output: 1948

🏁 Script executed:

# Also check if there's any guidance on tokenizer selection in training configs grep -A5 -B5 "tokenizer_name" bionemo-recipes/recipes/esm2_peft_te/hydra_config/defaults.yaml

Repository: NVIDIA/bionemo-framework

Length of output: 266

Derive tokenizer from the model checkpoint, not hardcoded.

Line 108 hardcodes "nvidia/esm2_t48_15B_UR50D" while the model is loaded from args.model_tag (e.g., "nvidia/esm2_t6_8M_UR50D" in L0_sanity_infer.yaml). Per Transformers documentation, the tokenizer must match the model checkpoint for vocabulary and special tokens to align. This mismatch can silently corrupt results.

The training scripts in this recipe correctly use tokenizer_name: ${model_tag} in the config. Update infer.py to follow the same pattern:

Suggested update

- tokenizer = AutoTokenizer.from_pretrained("nvidia/esm2_t48_15B_UR50D") + tokenizer_name = getattr(args, "tokenizer_name", None) or args.model_tag + tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)

Also add tokenizer_name: ${model_tag} to hydra_config/defaults_infer.yaml to allow overriding when needed (as done in training configs like L1_fb_15B.yaml).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tokenizer = AutoTokenizer.from_pretrained("nvidia/esm2_t48_15B_UR50D")

tokenizer_name = getattr(args, "tokenizer_name", None) or args.model_tag

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/infer.py` at line 108, Replace the hardcoded tokenizer checkpoint with the model-derived tokenizer: modify the AutoTokenizer.from_pretrained call (where tokenizer = AutoTokenizer.from_pretrained("nvidia/esm2_t48_15B_UR50D")) to load from the runtime/config value (e.g., args.model_tag or a config field tokenizer_name) so the tokenizer matches the model loaded elsewhere in infer.py; also add tokenizer_name: ${model_tag} to hydra_config/defaults_infer.yaml so the tokenizer checkpoint can be overridden via config. Ensure you reference the same symbol used to load the model (args.model_tag or the config object) when calling AutoTokenizer.from_pretrained.

coderabbitai · 2026-02-05T16:24:25Z

bionemo-recipes/recipes/esm2_peft_te/requirements.txt

+hydra-core
+peft @ git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora
 torch
 torchao!=0.14.0
+torchmetrics


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and read its contents cat bionemo-recipes/recipes/esm2_peft_te/requirements.txt

Repository: NVIDIA/bionemo-framework

Length of output: 245

🏁 Script executed:

# Search for any references to this peft dependency in the codebase rg "balvisio/peft" --type txt --type py --type md

Repository: NVIDIA/bionemo-framework

Length of output: 192

🏁 Script executed:

# Check if there's any documentation about this branch or dependency rg "dev/ba/support-te-lora"

Repository: NVIDIA/bionemo-framework

Length of output: 313

🏁 Script executed:

# Look for similar git-based dependencies in recipes fd "requirements.txt" bionemo-recipes/recipes/ -exec grep -l "git+" {} \;

Repository: NVIDIA/bionemo-framework

Length of output: 266

🏁 Script executed:

# Search for documentation or comments about this branch rg "support-te-lora" --type md --type txt --type py

Repository: NVIDIA/bionemo-framework

Length of output: 192

🏁 Script executed:

# Look for README or documentation files in the recipe directory find bionemo-recipes/recipes/esm2_peft_te -type f $ -name "README*" -o -name "*.md" -o -name "SETUP*" $

Repository: NVIDIA/bionemo-framework

Length of output: 274

🏁 Script executed:

# Check if there are any inline comments or documentation about peft dependency rg -B 3 -A 3 "balvisio/peft" bionemo-recipes/

Repository: NVIDIA/bionemo-framework

Length of output: 1001

🏁 Script executed:

# Check the recipe directory structure to understand what's included ls -la bionemo-recipes/recipes/esm2_peft_te/

Repository: NVIDIA/bionemo-framework

Length of output: 1560

🏁 Script executed:

# Read the README to understand why the custom branch is needed cat bionemo-recipes/recipes/esm2_peft_te/README.md

Repository: NVIDIA/bionemo-framework

Length of output: 4322

🏁 Script executed:

# Check if there are any comments in the training scripts about this dependency grep -n "peft\|TE\|LoRA" bionemo-recipes/recipes/esm2_peft_te/train_lora_*.py | head -20

Repository: NVIDIA/bionemo-framework

Length of output: 2534

🏁 Script executed:

# Look for any setup/install documentation find bionemo-recipes -name "SETUP*" -o -name "INSTALL*" | head -5

Repository: NVIDIA/bionemo-framework

Length of output: 50

Pin the git-based peft dependency to an immutable commit or tag.

The dependency on branch dev/ba/support-te-lora is non-reproducible and increases supply-chain risk. Branches can be force-pushed, rebased, or deleted without notice. Pin to a specific commit SHA (or a released tag) to ensure deterministic builds and maintain reproducibility required for production recipes.

🔧 Example fix

-peft @ git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora +peft @ git+https://github.com/balvisio/peft.git@<COMMIT_SHA>

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/requirements.txt` around lines 2 - 6, The git-based peft dependency is pinned to a mutable branch; update the requirement line "peft @ git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora" to reference an immutable identifier by replacing the branch name with a specific commit SHA or released tag (e.g., @<commit-sha> or `@vX.Y.Z`) so builds are reproducible and supply-chain safe; ensure the updated string remains in requirements.txt as "peft @ git+https://github.com/balvisio/peft.git@<commit-or-tag>".

coderabbitai · 2026-02-05T16:24:25Z

bionemo-recipes/recipes/esm2_peft_te/scheduler.py

@@ -0,0 +1 @@
+../esm2_native_te/scheduler.py


⚠️ Potential issue | 🟠 Major

Cross-recipe dependency violates recipe isolation rules.

This file effectively redirects to bionemo-recipes/recipes/esm2_native_te/scheduler.py, which breaks the rule that recipes must be self-contained and must not import or depend on other recipes. Please copy the needed logic into this recipe or provide a local implementation.

As per coding guidelines: “bionemo-recipes/recipes/**/*.py must be completely self-contained with no shared dependencies between recipes and no imports from other recipes.”

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/scheduler.py` at line 1, This file currently redirects to or imports from esm2_native_te.scheduler which violates recipe isolation; replace that cross-recipe dependency by copying the required scheduler logic into this recipe or implementing a local equivalent. Identify the exported symbols you depend on (e.g., Scheduler class, create_scheduler or get_scheduler factory function, and any helper functions like schedule_task or init_scheduler) from the esm2_native_te implementation, reproduce their behavior locally inside bionemo-recipes/recipes/esm2_peft_te/scheduler.py, update any local imports to use the new local implementations, and remove any import lines that reference esm2_native_te.scheduler so the recipe is fully self-contained.

⚠️ Potential issue | 🔴 Critical

Fix invalid Python file (current content will raise SyntaxError).

A bare path line is not valid Python code and will fail on import/execution. Replace this with a real implementation or a valid import/alias.

✅ Example fix (inline implementation placeholder)

-../esm2_native_te/scheduler.py +"""Scheduler utilities for esm2_peft_te.""" + +# TODO: Implement scheduler logic here (keep this recipe self-contained).

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/scheduler.py` at line 1, The file currently contains a bare path string which raises a SyntaxError; replace it with a valid Python implementation or a re-export import: remove the literal path and add an import that re-exports the native scheduler (e.g., import or from import of esm2_native_te.scheduler) or implement a minimal wrapper function/class matching this package's expected API (e.g., functions/classes used elsewhere that reference scheduler.py) so the module can be imported without error; update symbols to match callers if you implement a wrapper.

coderabbitai · 2026-02-05T16:24:25Z

bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py

+import hydra
+import peft
+import torch
+from modeling_esm_te import NVEsmForConvTokenClassification


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all instances of modeling_esm_te.py echo "=== Searching for modeling_esm_te.py ===" fd -a "modeling_esm_te.py" # Search for NVEsmForConvTokenClassification definition echo -e "\n=== Searching for NVEsmForConvTokenClassification definition ===" rg "class NVEsmForConvTokenClassification" -g '*.py' # Check what's in esm2_peft_te recipe directory echo -e "\n=== Contents of esm2_peft_te recipe ===" fd -a . "bionemo-recipes/recipes/esm2_peft_te" --type f | head -30 # Check imports from other esm2 recipes echo -e "\n=== Checking esm2_native_te recipe structure ===" fd -a . "bionemo-recipes/recipes/esm2_native_te" --type f | grep "modeling_esm" | head -20

Repository: NVIDIA/bionemo-framework

Length of output: 3711

🏁 Script executed:

#!/bin/bash # View the train_lora_convnet.py file around the import echo "=== train_lora_convnet.py imports ===" head -40 "bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py" # Check if there's a setup.py or pyproject.toml that adds models to path echo -e "\n=== Check esm2_peft_te for path configuration ===" find "bionemo-recipes/recipes/esm2_peft_te" -name "setup.py" -o -name "pyproject.toml" -o -name "__init__.py" | head -10 # Check if recipe root has sys.path manipulation echo -e "\n=== Check for PYTHONPATH or sys.path in train files ===" grep -r "sys.path" "bionemo-recipes/recipes/esm2_peft_te" --include="*.py" | head -10 # Check parent recipe esm2_native_te for comparison echo -e "\n=== Compare with esm2_native_te imports ===" head -40 "bionemo-recipes/recipes/esm2_native_te/train_lora_ddp.py" 2>/dev/null || echo "File not found" # Check if models/esm2 is meant to be a shared package echo -e "\n=== Check models/esm2 structure ===" ls -la "bionemo-recipes/models/esm2/src/" 2>/dev/null || echo "Directory structure:" find "bionemo-recipes/models/esm2" -name "__init__.py" | head -5

Repository: NVIDIA/bionemo-framework

Length of output: 2095

🏁 Script executed:

#!/bin/bash # View the Dockerfile echo "=== Dockerfile ===" cat "bionemo-recipes/recipes/esm2_peft_te/Dockerfile" # Check if there's a docker-compose or similar echo -e "\n=== Docker/run configuration ===" find "bionemo-recipes/recipes/esm2_peft_te" -name "docker-compose*" -o -name "*.sh" | head -10 # Check if there's any entrypoint script echo -e "\n=== Entrypoint/wrapper scripts ===" find "bionemo-recipes/recipes/esm2_peft_te" -type f -name "*.sh" -exec head -5 {} \; # Check other recipes for comparison on how they handle modeling imports echo -e "\n=== Check esm2_accelerate_te imports ===" grep "from modeling_esm_te" "bionemo-recipes/recipes/esm2_accelerate_te/train_lora_ddp.py" 2>/dev/null || echo "Not found" # Check README for setup instructions echo -e "\n=== README setup instructions ===" head -100 "bionemo-recipes/recipes/esm2_peft_te/README.md" | grep -A 10 -B 5 "PYTHONPATH\|sys.path\|modeling\|import" || echo "No specific setup found"

Repository: NVIDIA/bionemo-framework

Length of output: 1020

🏁 Script executed:

#!/bin/bash # Check requirements.txt echo "=== esm2_peft_te requirements.txt ===" cat "bionemo-recipes/recipes/esm2_peft_te/requirements.txt" # Check if models/esm2 has setup.py or pyproject.toml echo -e "\n=== models/esm2 setup configuration ===" ls -la "bionemo-recipes/models/esm2/" | grep -E "setup|pyproject" # Check if models/esm2 is installable echo -e "\n=== models/esm2 pyproject.toml ===" cat "bionemo-recipes/models/esm2/pyproject.toml" 2>/dev/null || echo "Not found" # Check the parent bionemo-recipes for any workspace or setup echo -e "\n=== Root pyproject.toml or setup ===" head -50 "bionemo-recipes/pyproject.toml" 2>/dev/null || echo "Not found" # Check esm2_native_te to see how it handles this echo -e "\n=== esm2_native_te Dockerfile ===" cat "bionemo-recipes/recipes/esm2_native_te/Dockerfile" 2>/dev/null | head -30 || echo "File not found" # Check if there's a setup.py in models/esm2 echo -e "\n=== models/esm2 structure ===" ls -la "bionemo-recipes/models/esm2/"

Repository: NVIDIA/bionemo-framework

Length of output: 2186

🏁 Script executed:

#!/bin/bash # Check if models/esm2 is meant to be installed as a dependency echo "=== Check if esm package should be in requirements ===" grep -r "esm" "bionemo-recipes/recipes/esm2_peft_te/requirements.txt" # Check if there's a higher-level requirements or setup echo -e "\n=== Check for parent-level requirements ===" find "bionemo-recipes" -maxdepth 2 -name "requirements.txt" -o -name "pyproject.toml" | head -10 # Verify how esm2_native_te actually handles this echo -e "\n=== esm2_native_te requirements ===" cat "bionemo-recipes/recipes/esm2_native_te/requirements.txt" 2>/dev/null || echo "Not found" # Check if there's any documentation about how recipes should be run echo -e "\n=== Top-level README ===" head -100 "bionemo-recipes/README.md" 2>/dev/null | grep -A 5 -B 5 "setup\|install\|PYTHONPATH" || echo "Not found" # Look for any CI/runner configuration echo -e "\n=== GitHub workflows or CI config ===" find . -path "./.git" -prune -o -name "*.yml" -o -name "*.yaml" | grep -E "github|gitlab|ci" | head -5

Repository: NVIDIA/bionemo-framework

Length of output: 1125

🏁 Script executed:

#!/bin/bash # Check if there's an install script or setup docs echo "=== Check esm2_peft_te directory for setup docs ===" find "bionemo-recipes/recipes/esm2_peft_te" -type f $ -name "*.md" -o -name "*.txt" -o -name "*.sh" $ | xargs grep -l "PYTHONPATH\|sys.path\|modeling" 2>/dev/null || echo "None found" # Check if the import works with current setup by looking at test runs echo -e "\n=== Check test configuration for PYTHONPATH ===" cat "bionemo-recipes/recipes/esm2_peft_te/tests/conftest.py" 2>/dev/null | head -30 # Check what the actual test file imports echo -e "\n=== Check test imports ===" head -40 "bionemo-recipes/recipes/esm2_peft_te/tests/"*.py 2>/dev/null | grep -A 5 -B 5 "import" # Verify: does the Dockerfile actually work as-is? echo -e "\n=== Double check Dockerfile details ===" cat "bionemo-recipes/recipes/esm2_peft_te/Dockerfile"

Repository: NVIDIA/bionemo-framework

Length of output: 3749

🏁 Script executed:

#!/bin/bash # Check how training is supposed to be launched in the container echo "=== Check for entrypoint or launch instructions ===" grep -r "ENTRYPOINT\|CMD" "bionemo-recipes/recipes/esm2_peft_te/" --include="Dockerfile" --include="*.md" # Check if there's an actual way this import resolves echo -e "\n=== Verify: does train_lora_convnet.py have any sys.path manipulation? ===" head -100 "bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py" | grep -E "sys.path|PYTHONPATH" # Check if the models/esm2 package is supposed to be installed during Dockerfile build echo -e "\n=== Check if pip installs from models/esm2 ===" grep -r "pip install.*models\|pip install.*esm" "bionemo-recipes/recipes/esm2_peft_te/Dockerfile" # Check esm2_native_te for comparison - how does it work? echo -e "\n=== esm2_native_te Dockerfile full ===" cat "bionemo-recipes/recipes/esm2_native_te/Dockerfile" # Final check: any way to resolve modeling_esm_te locally in peft_te recipe? echo -e "\n=== Is there a modeling_esm_te.py in esm2_peft_te? ===" find "bionemo-recipes/recipes/esm2_peft_te" -name "modeling_esm_te.py" -o -name "esm_nv.py" | head -5

Repository: NVIDIA/bionemo-framework

Length of output: 781

🏁 Script executed:

#!/bin/bash # Let me verify the exact directory structure to confirm the violation echo "=== Complete verification of import resolution ===" echo "1. Location of modeling_esm_te.py:" find "bionemo-recipes" -name "modeling_esm_te.py" -type f echo -e "\n2. Is it copied into esm2_peft_te Dockerfile?" echo "Checking Dockerfile COPY commands:" grep "COPY.*modeling\|COPY.*models" "bionemo-recipes/recipes/esm2_peft_te/Dockerfile" || echo "No models/ copy found" echo -e "\n3. Is models/esm2 in esm2_peft_te's requirements?" cat "bionemo-recipes/recipes/esm2_peft_te/requirements.txt" echo -e "\n4. Check if any other esm2_* recipe handles this correctly" grep -r "from modeling_esm_te\|from esm.modeling" "bionemo-recipes/recipes/" --include="*.py" | grep -v "example_" | head -10 echo -e "\n5. Verify the import statement one more time:" grep -n "from modeling_esm_te" "bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py"

Repository: NVIDIA/bionemo-framework

Length of output: 830

Fix self-contained recipe dependency: modeling_esm_te import.

The import from modeling_esm_te import NVEsmForConvTokenClassification (line 25) depends on code in bionemo-recipes/models/esm2/src/esm/modeling_esm_te.py, which is outside this recipe. The Dockerfile does not copy or install this module, and requirements.txt does not include the esm package. This violates the self-containment rule: recipes must not import from other recipes or shared code directories.

Either include modeling_esm_te.py locally in this recipe or install the esm package from models/esm2/ as a documented dependency.

🤖 Prompt for AI Agents

In `@bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py` at line 25, The recipe imports NVEsmForConvTokenClassification from modeling_esm_te in train_lora_convnet.py which pulls code from another recipe and breaks self-containment; either vendor the required module into this recipe (add modeling_esm_te.py with NVEsmForConvTokenClassification implementation alongside train_lora_convnet.py and update the package/module path) or declare and install the external esm package from models/esm2 (add it to requirements.txt and update the Dockerfile to copy/install that package) so that the import in train_lora_convnet.py resolves without referencing code outside this recipe.

Add ESM2 PEFT recipe

44c2f2e

Signed-off-by: Bruno Alvisio <balvisio@nvidia.com>

balvisio requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, pstjohn, savitha-eng and trvachov as code owners February 4, 2026 16:08

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ESM2 PEFT recipe#1446

Add ESM2 PEFT recipe#1446
balvisio wants to merge 1 commit intomainfrom
dev/ba/esm2-peft-recipe

balvisio commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Review skipped

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1 @@
		../esm2_native_te/checkpoint.py No newline at end of file

		@@ -0,0 +1 @@
		../esm2_native_te/distributed_config.py No newline at end of file

	tokenizer = AutoTokenizer.from_pretrained("nvidia/esm2_t48_15B_UR50D")
	tokenizer_name = getattr(args, "tokenizer_name", None) or args.model_tag
	tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)

		@@ -0,0 +1 @@
		../esm2_native_te/scheduler.py No newline at end of file

Conversation

balvisio commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Triggering Code Rabbit AI Review

Pre-submit Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Tokenizer selection best practices for AutoModelForTokenClassification (Transformers)

Minimal, “safe default” loading snippet

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

balvisio commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Tokenizer selection best practices for `AutoModelForTokenClassification` (Transformers)