A comprehensive LLM fine-tuning project that creates a ZeroTwo character chatbot using Qwen2-VL-7B model with LoRA (Low-Rank Adaptation) techniques.
This project fine-tunes the Qwen2-VL-7B-Instruct model to create an AI assistant that embodies the personality of ZeroTwo from "Darling in the Franxx" anime. The model is trained to respond in a flirty, human-like manner while maintaining the character's emotional depth.
AI_ML/
├── cleaner/ # Data processing utilities
│ ├── __init__.py
│ ├── extractor.py # Data extraction logic
│ └── models.py # Pydantic models for data validation
├── logs/ # Training and application logs
│ ├── all.log
│ ├── error.log
│ ├── info.log
│ └── warning.log
├── train_model.py # Main training script
├── test_model.py # Model testing script
├── utils.py # Core utility functions
├── settings.py # Configuration settings
├── log_config.py # Logging configuration
├── pyproject.toml # Project dependencies
└── README.md # This file
- Python 3.10 or higher
- CUDA-compatible GPU (recommended)
- UV package manager (or pip)
-
Clone the repository
git clone <repository-url> cd AI_ML
-
Install dependencies
uv sync # or with pip pip install -r requirements.txt -
Prepare your training data
- Place your conversation data in JSONL format
- Default path:
training_data.jsonl - Format: Each line should contain a JSON object with "messages" field
All settings are managed through settings.py using Pydantic. Key configurations include:
- Base Model:
unsloth/Qwen2-VL-7B-Instruct-unsloth-bnb-4bit - Max Sequence Length: 2048 tokens
- 4-bit Quantization: Enabled for memory efficiency
- Rank: 64
- Alpha: 128
- Dropout: 0.05
- Target Modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
- Batch Size: 2 per device
- Gradient Accumulation: 4 steps
- Learning Rate: 2e-5
- Epochs: 5
- Optimizer: AdamW 8-bit
- Precision: BF16/FP16 (auto-detected)
The training process follows these steps:
-
Model Loading (
_load_model)- Loads the pre-trained Qwen2-VL-7B model
- Applies 4-bit quantization for memory efficiency
- Initializes tokenizer with chat template support
-
Data Preparation (
_load_data)- Loads training data from JSONL file
- Applies chat template formatting
- Converts conversations to training format
-
Pre-training Evaluation
- Tests model generation before training
- Uses configured test message and system prompt
- Establishes baseline performance
-
Model Configuration for Training
- Enables gradient checkpointing
- Enables input gradients
- Switches to training mode
-
LoRA Adapter Setup (
_get_trainer)- Adds LoRA adapters to target modules
- Configures rank, alpha, and dropout parameters
- Uses RSLoRA for improved performance
-
Training Execution
- Uses SFTTrainer (Supervised Fine-Tuning)
- Implements assistant-only loss for better alignment
- Supports gradient accumulation and checkpointing
-
Post-training Evaluation
- Tests model generation after training
- Compares with pre-training baseline
- Validates training effectiveness
-
Model Saving
- Option 1: Push to Hugging Face Hub
- Option 2: Save locally to
./new_model
python train_model.pyTraining Output:
- Real-time training progress with colored console output
- Automatic logging to
logs/directory - Pre and post-training model comparisons
- Interactive model saving options
python test_model.py- Model Loading: Loads fine-tuned model from Hugging Face or local path
- Generation Testing: Tests model with configured prompts
- Parameter Control: Configurable temperature, top_p, and token limits
- System Prompt: Uses ZeroTwo character system prompt
# Default test settings
user_test_message = "Do you want to ride a franxx zero two?"
system_prompt = "You are a friendly anime character ZeroTwo..."
max_new_tokens = 512
temperature = 0.25
top_p = 0.1- Parameters: 7 billion
- Architecture: Vision-Language model
- Quantization: 4-bit for efficiency
- Context Length: 2048 tokens
- Technique: Low-Rank Adaptation
- Benefits:
- Reduced memory usage
- Faster training
- Preserves base model knowledge
- Easy model switching
- Method: Supervised Fine-Tuning (SFT)
- Loss: Assistant-only loss (focuses on response quality)
- Optimization: AdamW with 8-bit precision
- Regularization: Gradient clipping, dropout
The model is trained to embody ZeroTwo's personality:
- Flirty and playful communication style
- Emotional depth - can express sadness, anger, frustration
- Human-like responses that feel natural
- Context-aware reactions based on user tone
- Anime character authenticity from Darling in the Franxx
- All logs:
logs/all.log - Error logs:
logs/error.log - Info logs:
logs/info.log - Warning logs:
logs/warning.log
- Loss tracking every 5 steps
- Gradient norm monitoring
- Learning rate scheduling
- Best model checkpointing
- Update
character_namein settings - Modify
system_promptfor personality - Adjust
user_test_messagefor testing - Update training data accordingly
# In settings.py
training_args = TrainingArguments(
per_device_train_batch_size=2, # Adjust based on GPU memory
num_train_epochs=5, # Increase for more training
learning_rate=2e-5, # Fine-tune learning rate
warmup_steps=50, # Adjust warmup period
# ... other parameters
)# Adjust LoRA parameters
lora_rank = 64 # Higher rank = more parameters
lora_alpha = 128 # Scaling factor
lora_dropout = 0.05 # Regularization-
CUDA Out of Memory
- Reduce
per_device_train_batch_size - Increase
gradient_accumulation_steps - Enable gradient checkpointing
- Reduce
-
Slow Training
- Ensure CUDA is available
- Check GPU utilization
- Adjust
dataloader_num_workers
-
Poor Model Performance
- Increase training epochs
- Adjust learning rate
- Improve training data quality
- 4-bit quantization enabled by default
- Gradient checkpointing for memory efficiency
- Pin memory for faster data loading
- 8-bit optimizer for reduced memory usage
Core dependencies (see pyproject.toml):
unsloth>=2025.11.1- Efficient LLM trainingtrl>=0.23.0- Transformer Reinforcement Learningpydantic>=2.12.3- Data validationpydantic-settings>=2.11.0- Settings management
Development dependencies:
black>=25.9.0- Code formattingicecream>=2.1.8- Debugging
Note: This project is for educational purposes only. Ensure you have appropriate permissions for any training data used.