Skip to content

Latest commit

 

History

History
323 lines (253 loc) · 7.1 KB

File metadata and controls

323 lines (253 loc) · 7.1 KB

FineTuneLite - Sample Datasets & Testing Guide

📁 Sample Datasets

Three comprehensive datasets are included for testing and demonstration:

1. Customer Support Dataset

File: sample_datasets/customer_support.json Size: 50 Q&A pairs Use Case: Customer service chatbot training

Topics Covered:

  • Greetings and basic interactions
  • Product information and specifications
  • Order tracking and shipping
  • Returns and refunds
  • Account management
  • Payment and billing
  • Technical support
  • Policies and procedures

Example:

{
  "input": "How do I reset my password?",
  "output": "To reset your password, click 'Forgot Password' on the login page..."
}

2. Technical Q&A Dataset

File: sample_datasets/tech_qa.json Size: 30 Q&A pairs Use Case: Programming assistant training

Topics Covered:

  • Python programming
  • JavaScript development
  • Git version control
  • Web development
  • Database management
  • API development
  • Debugging techniques
  • Best practices

Example:

{
  "input": "How do I create a virtual environment in Python?",
  "output": "Use `python -m venv venv` to create a virtual environment..."
}

3. Conversation Dataset

File: sample_datasets/conversation.json Size: 25 Q&A pairs Use Case: General conversational AI training

Topics Covered:

  • Greetings and farewells
  • Small talk and chitchat
  • Emotional responses
  • AI capabilities and limitations
  • Humor and personality
  • Helpful assistance

Example:

{
  "input": "Tell me a joke",
  "output": "Why don't scientists trust atoms? Because they make up everything! 😄"
}

🧪 Testing Commands

Quick API Test

python test_suite.py

Tests:

  • Health check
  • LM Studio connectivity
  • Model listing
  • Chat completion
  • Teacher/Critic evaluation
  • Dataset management

Output: HTML report in test_reports/


Comprehensive End-to-End Test

python comprehensive_test.py

Tests:

  • All API endpoints
  • Multiple chat scenarios
  • Teacher/Critic with various inputs
  • Dataset upload for all sample files
  • System statistics
  • Performance benchmarking

Output:

  • Detailed console output with color-coded results
  • Comprehensive HTML performance report
  • Performance metrics and recommendations

📊 What Gets Tested

System Health

  • ✅ Backend availability
  • ✅ LM Studio connectivity
  • ✅ Model availability

Core Features

  • ✅ Model listing from LM Studio
  • ✅ Chat completion with various prompts
  • ✅ Teacher/Critic evaluation
  • ✅ Dataset upload and listing
  • ✅ System resource monitoring

Performance Metrics

  • ⚡ API response times (avg, min, max, median)
  • ⚡ Chat response times
  • ⚡ Model load times
  • ⚡ Dataset upload times
  • ⚡ Success rate percentage
  • ⚡ Throughput (requests/second)

📈 Performance Report Features

The comprehensive test generates an HTML report with:

Summary Cards

  • Total requests
  • Successful requests
  • Failed requests
  • Success rate percentage

Detailed Metrics

  • API response time statistics
  • Chat completion performance
  • Model loading performance
  • Dataset upload performance

Visualizations

  • Success rate progress bar
  • Performance metric tables
  • Feature badges
  • Error reporting

Recommendations

  • Performance optimization tips
  • Troubleshooting guidance
  • System upgrade suggestions
  • Best practices

🎯 Using Sample Datasets

Upload via UI

  1. Start FineTuneLite (start.bat or start.sh)
  2. Navigate to http://localhost:3000/datasets
  3. Click "Upload Dataset"
  4. Select a file from sample_datasets/
  5. View uploaded dataset in list

Upload via API

curl -X POST http://localhost:8000/datasets/upload \
  -F "file=@sample_datasets/customer_support.json"

List Datasets

curl http://localhost:8000/datasets/

🔧 Fine-Tuning with Sample Data

Step 1: Upload Dataset

Use one of the sample datasets or create your own.

Step 2: Configure Fine-Tuning

  1. Go to Fine-tune page
  2. Select base model (IBM Granite 4.0 H Tiny recommended)
  3. Choose uploaded dataset
  4. Set hyperparameters:
    • Epochs: 1-3
    • Batch Size: 1 (for CPU)
    • Learning Rate: 0.0002
    • PEFT Type: LoRA

Step 3: Start Training

Click "Start Training" and monitor progress in Training Jobs page.

Note: Full training implementation is in progress. Current version demonstrates the UI and workflow.


📝 Creating Custom Datasets

JSON Format

[
  {
    "input": "User question or prompt",
    "output": "Expected model response"
  },
  {
    "input": "Another question",
    "output": "Another response"
  }
]

Best Practices

  • Size: 50-500 examples for fine-tuning
  • Quality: Clear, accurate, consistent responses
  • Diversity: Cover various scenarios and edge cases
  • Format: Consistent structure across all examples
  • Balance: Mix of simple and complex examples

CSV Format (Alternative)

input,output
"Question 1","Answer 1"
"Question 2","Answer 2"

🚀 Performance Expectations

CPU-Only Inference (Ryzen 5, 8GB RAM)

Operation Expected Time Notes
Health Check <0.1s Instant
Model List 0.2-0.5s Fast
Chat (short) 10-15s Model-dependent
Chat (long) 15-25s Model-dependent
Evaluation 10-20s Dual model calls
Dataset Upload 0.1-1s Size-dependent

Optimization Tips

  • Use smaller models for faster responses
  • Reduce max_tokens in requests
  • Close other CPU-intensive apps
  • Use SSD for model storage
  • Consider GPU for production

📞 Troubleshooting

Tests Failing

  1. Check LM Studio: Ensure it's running on port 1234
  2. Load Model: At least one model must be loaded
  3. Backend Running: Verify backend on port 8000
  4. Network: Check localhost connectivity

Slow Performance

  1. CPU Usage: Close other applications
  2. Model Size: Try smaller models
  3. System Resources: Check RAM availability
  4. Temperature: Ensure CPU isn't thermal throttling

Dataset Upload Fails

  1. Format: Verify JSON/CSV format is correct
  2. Size: Check file isn't too large (>100MB)
  3. Permissions: Ensure write access to data/uploads/
  4. Backend: Check backend logs for errors

📚 Additional Resources

  • Setup Guide: SETUP_GUIDE.md
  • Deployment: DEPLOYMENT.md
  • Architecture: docs/architecture.md
  • Demo Script: docs/demo_script.md

✅ Quick Checklist

Before testing:

  • LM Studio installed and running
  • IBM Granite 4.0 H Tiny downloaded
  • LM Studio Local Server started (port 1234)
  • Backend running (python -m uvicorn main:app --reload)
  • Frontend running (npm run dev)
  • Sample datasets in sample_datasets/ folder

Run tests:

  • python test_suite.py for quick test
  • python comprehensive_test.py for full analysis
  • Review HTML reports in test_reports/
  • Check performance metrics
  • Follow recommendations

Happy Testing! 🎉

For questions or issues, refer to the comprehensive documentation or check the test reports for detailed diagnostics.