Skip to content

Add PEFT adapter format conversion support#48

Open
dapopov-st wants to merge 1 commit intomainfrom
eval-peft-support
Open

Add PEFT adapter format conversion support#48
dapopov-st wants to merge 1 commit intomainfrom
eval-peft-support

Conversation

@dapopov-st
Copy link
Contributor

@dapopov-st dapopov-st commented Apr 21, 2025

Add PEFT Adapter Support for Model Evaluation

This PR adds utilities for handling PEFT adapters during evaluation to avoid the "device meta is invalid" safetensors error that occurs when evaluating fine-tuned models with adapters.

Changes

  1. Added two utility functions to llmfoundry/command_utils/eval.py
    • convert_peft_adapter_format() : Converts safetensors adapter files to bin format (only if needed). Always updates the adapter_config.json to ensure it references .bin files instead of .safetensors files.
    • restore_safetensors_after_eval(): Restores original files after evaluation (only if backup exists). Always updates the adapter_config.json to ensure it references .safetensors files instead of .bin files.

Problem Solved

When evaluating models with PEFT adapters, the evaluation often fails with:

safetensors_rust.SafetensorError: device meta is invalid

This happens because safetensors files contain device metadata that becomes invalid when loaded in different environments.

Required Manual Change

IMPORTANT: You need to manually rename scripts/eval/yamls/hf_lora_eval.yml to scripts/eval/yamls/hf_lora_eval.yaml in your environment. This PR does not include this renamed file.

Testing Note

These functions were tested in the local environment using local_llama_training_instruct.py but have not yet been tested in the Modal cloud environment. The functions are designed to be environment-agnostic, using only standard Python libraries and conditional logic to handle file operations safely.

Suggested Integration Guide for Modal Environment

To integrate these changes in your Modal workflow:

@app.function(gpu=TRAINING_GPU, image=image, timeout=3600, secrets=[Secret.from_name("LRG")],
              volumes={MODEL_CHECKPOINT_VOLUME_MOUNT_PATH: MODEL_CHECKPOINT_VOLUME},
              concurrency_limit=1)
def evaluate_model(checkpoint_path: str):
    import subprocess, os
    from pathlib import Path
    
    # Import the adapter conversion utilities
    from llmfoundry.command_utils.eval import convert_peft_adapter_format, restore_safetensors_after_eval
    
    os.chdir("/llm-foundry/scripts")
    print(f"Working directory: {os.getcwd()}")
    
    model_path = Path(MODEL_CHECKPOINT_VOLUME_MOUNT_PATH)/checkpoint_path
    save_path = model_path/"evals"  # Create evals subfolder path
    
    # Using IS_PEFT global variable as in our local implementation
    if IS_PEFT:
        adapter_config_path = model_path/"adapter_config.json"
        if not adapter_config_path.exists():
            raise FileNotFoundError(f"PEFT adapter config not found at {adapter_config_path}. Check IS_PEFT setting or model path.")
        print("PEFT adapter detected, converting format...")
        convert_peft_adapter_format(str(model_path))
    
    print("\nEvaluating model...")
    eval_cmd = [
        "composer",
        "eval/eval.py",
        "eval/yamls/hf_lora_eval.yaml",  # Note: Must rename from .yml to .yaml
        "icl_tasks=eval/yamls/copa.yaml",
        f"variables.model_name_or_path={model_path}",
        f"variables.lora_id_or_path={model_path if IS_PEFT else ''}",  # Add for PEFT models
        f"results_path={save_path}"
    ]
    result = subprocess.run(eval_cmd, capture_output=True, text=True)
    print(result.stdout)
    if result.stderr:
        print("Evaluation errors:", result.stderr)
    
    # Restore original safetensors files
    if IS_PEFT:
        print("Restoring original safetensors files...")
        restore_safetensors_after_eval(str(model_path))
    
    MODEL_CHECKPOINT_VOLUME.commit()  # Commit the new eval results
    print("Evaluation complete!")

Important Notes

  1. File Extension: Please manually rename hf_lora_eval.yml to hf_lora_eval.yaml. All the other files in that folder have .yaml extension and configs expect this. This was almost certainly a typo in llm-foundry.
  2. Global IS_PEFT: The code assumes you're using the global IS_PEFT variable, as in our local implementation
  3. Path Conversion: Note how we convert Path objects to strings when calling the utility functions
  4. Working Directory: The utilities expect absolute paths to the model directory

These changes enable reliable evaluation of PEFT adapter models across different environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant