fix: prevent UnboundLocalError when saving merged models#372
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies src/heretic/model.py to set _hf_peft_config_loaded = False after merging LoRA adapters, and introduces a dtype attribute to handle model resetting when a prior load was interrupted. Feedback from the reviewer indicates that the changes related to self.dtype and handling self.model is None in reset_model violate the repository style guide as they are unrelated to the main objective of the PR. Furthermore, using self.dtype when self.model is None could lead to an AttributeError if the model initialization was interrupted before self.dtype was set.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| peft_config: LoraConfig | ||
| dtype: torch.dtype |
There was a problem hiding this comment.
This field is part of the unrelated changes introduced in this PR. According to the repository style guide, PRs must not change existing code unless the changes are directly related to the PR. Please revert this change and submit it in a separate PR if needed.
| peft_config: LoraConfig | |
| dtype: torch.dtype | |
| peft_config: LoraConfig |
References
- PRs must not change existing code unless the changes are directly related to the PR. (link)
| ) | ||
| self.dtype = self.model.dtype |
There was a problem hiding this comment.
This assignment is part of the unrelated changes introduced in this PR. According to the repository style guide, PRs must not change existing code unless the changes are directly related to the PR. Please revert this change and submit it in a separate PR if needed.
| ) | |
| self.dtype = self.model.dtype | |
| ) |
References
- PRs must not change existing code unless the changes are directly related to the PR. (link)
| # If a prior model load was interrupted/cancelled mid-process, self.model will be None. | ||
| current_model = None | ||
| if self.model is not None: | ||
| current_model = getattr(self.model.config, "name_or_path", None) | ||
|
|
||
| if current_model == self.settings.model and not self.needs_reload: | ||
| # Reset LoRA adapters to zero (identity transformation) | ||
| # Reset LoRA adapters to zero (identity transformation). | ||
| for name, module in self.model.named_modules(): | ||
| if "lora_B" in name and hasattr(module, "weight"): | ||
| torch.nn.init.zeros_(module.weight) | ||
| return | ||
|
|
||
| dtype = self.model.dtype | ||
|
|
||
| # Purge existing model object from memory to make space. | ||
| self.model = None # ty:ignore[invalid-assignment] | ||
| empty_cache() | ||
|
|
||
| quantization_config = self._get_quantization_config(str(dtype).split(".")[-1]) | ||
| quantization_config = self._get_quantization_config( | ||
| str(self.dtype).split(".")[-1] | ||
| ) | ||
|
|
||
| # Build kwargs, only include quantization_config if it's not None | ||
| # Build kwargs, only include quantization_config if it's not None. | ||
| extra_kwargs = {} | ||
| if quantization_config is not None: | ||
| extra_kwargs["quantization_config"] = quantization_config | ||
|
|
||
| self.model = get_model_class(self.settings.model).from_pretrained( | ||
| self.settings.model, | ||
| dtype=dtype, | ||
| dtype=self.dtype, | ||
| device_map=self.settings.device_map, | ||
| max_memory=self.max_memory, | ||
| trust_remote_code=self.trusted_models.get(self.settings.model), |
There was a problem hiding this comment.
These changes in reset_model (handling self.model is None and introducing self.dtype) and the minor comment formatting changes are unrelated to the main objective of fixing the UnboundLocalError when saving merged models.
According to the Repository Style Guide (rules 9-11):
- Pull requests should implement one change, and one change only.
- PRs containing multiple semantically independent changes must be split into multiple PRs.
- PRs must not change existing code unless the changes are directly related to the PR. This includes changes to formatting and comments.
Additionally, if self.model is None due to an interrupted or cancelled load, self.dtype might not have been initialized in __init__, which would cause an AttributeError when accessing self.dtype on line 341.
Please revert these unrelated changes and submit them in a separate pull request if needed.
current_model = getattr(self.model.config, "name_or_path", None)
if current_model == self.settings.model and not self.needs_reload:
# Reset LoRA adapters to zero (identity transformation)
for name, module in self.model.named_modules():
if "lora_B" in name and hasattr(module, "weight"):
torch.nn.init.zeros_(module.weight)
return
dtype = self.model.dtype
# Purge existing model object from memory to make space.
self.model = None # ty:ignore[invalid-assignment]
empty_cache()
quantization_config = self._get_quantization_config(str(dtype).split(".")[-1])
# Build kwargs, only include quantization_config if it's not None
extra_kwargs = {}
if quantization_config is not None:
extra_kwargs["quantization_config"] = quantization_config
self.model = get_model_class(self.settings.model).from_pretrained(
self.settings.model,
dtype=dtype,
device_map=self.settings.device_map,
max_memory=self.max_memory,
trust_remote_code=self.trusted_models.get(self.settings.model),References
- Pull requests should implement one change, and one change only. PRs containing multiple semantically independent changes must be split into multiple PRs. PRs must not change existing code unless the changes are directly related to the PR. (link)
Fixes #289.
This PR resolves the
UnboundLocalError: local variable 'active_adapters' referenced before assignmentthat occurs when saving a merged model (strategy == "merged").Cause
When
merge_and_unload()is called, the LoRA adapters are integrated into the base model weights, and the adapter modules (BaseTunerLayer) are removed. However, the model still retains_hf_peft_config_loaded = True.When saving the merged model with
save_pretrained(),transformersthinks it is still a PEFT model and attempts to save the adapters, callingactive_adapters()which fails withUnboundLocalErrorsince there are no tuner layers.Fix
By setting
merged_model._hf_peft_config_loaded = Falseimmediately aftermerge_and_unload(), we ensure thatsave_pretrained()correctly treats the merged model as a plain base model.