This document provides comprehensive testing procedures for all osync commands and features.
- Ollama installed and running on local machine (default: http://localhost:11434)
- At least one model installed locally for testing
- (Optional) Remote Ollama server for testing remote operations
- (Optional) Multiple test models of varying sizes
Local Listing:
# Test basic listing
osync ls
# Expected: Display all local models with ID, size, and modified date
# Test pattern matching
osync ls "llama*"
osync ls "*:7b"
osync ls "*test*"
# Expected: Display only matching models
# Test sorting options
osync ls --size # Largest first
osync ls --sizeasc # Smallest first
osync ls --time # Newest first
osync ls --timeasc # Oldest first
# Expected: Models sorted according to optionRemote Listing:
# Test remote server listing
osync ls http://192.168.0.100:11434
osync ls "llama*" http://192.168.0.100:11434
# Expected: Display models from remote serverEdge Cases:
- Empty model list
- Models with special characters in names
- Models with registry paths (hf.co/user/model)
Local Copy:
# Test local copy (backup)
osync cp llama3 llama3-backup
osync cp qwen2:7b qwen2:backup-v1
# Expected: Create copy of model locally
# Test destination exists
osync cp llama3 <existing-model>
# Expected: Error message preventing overwriteLocal to Remote:
# Test upload to remote server
osync cp llama3 http://192.168.0.100:11434
osync cp qwen2:7b http://192.168.0.100:11434
# Test with bandwidth throttling
osync cp llama3 http://192.168.0.100:11434 -bt 50MB
# Expected: Upload with progress bar, speed limited to 50MB/s
# Test incremental upload (run same command twice)
osync cp llama3 http://192.168.0.100:11434
# Expected: Second run skips already uploaded layersRemote to Remote:
# Test remote-to-remote transfer (requires registry model)
osync cp http://server1:11434/llama3 http://server2:11434/llama3
# Test with custom buffer size
osync cp http://server1:11434/llama3 http://server2:11434/llama3 -BufferSize 1GB
# Expected: Transfer with memory buffering, progress display
# Test with locally created model
osync cp http://server1:11434/custom-model http://server2:11434/custom-model
# Expected: Error indicating model must be from registryEdge Cases:
- Very large models (>10GB)
- Network interruptions
- Invalid destination URLs
- Models without
:latesttag (should auto-append)
# Test basic rename
osync rename llama3 my-llama3
osync mv qwen2:7b qwen2:backup
# Test rename to existing model
osync ren llama3 <existing-model>
# Expected: Error preventing overwrite
# Test rename with verification
osync rename test-model test-model-v2
# Expected: Copy → Verify → Delete originalEdge Cases:
- Source model doesn't exist
- Destination already exists
- Models with special characters
# Test single model deletion
osync rm test-model
osync rm llama3:backup
# Test pattern deletion
osync rm "test-*"
osync rm "*:backup"
# Expected: Confirmation prompt, then delete matching models
# Test remote deletion
osync rm "old-*" http://192.168.0.100:11434Edge Cases:
- Model doesn't exist
- Empty pattern match
- Attempting to delete all models (
*)
# Test update single model
osync update llama3
# Expected: Update if new version available, or "already up to date"
# Test update all models
osync update
osync update "*"
# Expected: Update all outdated models
# Test pattern update
osync update "llama*"
osync update "*:7b"
# Test remote update
osync update llama3 http://192.168.0.100:11434
osync update "*" http://192.168.0.100:11434Edge Cases:
- Model already up to date
- Model not in registry
- Network failures during update
# Test local model info
osync show llama3
osync show qwen2:7b
# Expected: Display model metadata, parameters, configuration
# Test remote model info
osync show llama3 http://192.168.0.100:11434Edge Cases:
- Model doesn't exist
- Model without extended info
# Test pull from registry
osync pull llama3
osync pull qwen2:7b
osync pull hf.co/unsloth/llama3
# Expected: Download model with progress
# Test pull to remote server
osync pull llama3 http://192.168.0.100:11434
# Test pull non-existent model
osync pull fake-model-123
# Expected: Error message indicating model not foundEdge Cases:
- Model already exists locally
- Network failures
- Invalid model names
- Registry unavailable
# Test local chat
osync run llama3
osync chat qwen2:7b
# Expected: Preload model, enter chat mode
# Test remote chat
osync run llama3 http://192.168.0.100:11434
# Test exit methods
# - Type "/bye"
# - Press Ctrl+D
# Expected: Both methods exit cleanlyEdge Cases:
- Model doesn't exist
- Model fails to load
- Very long conversations
# Test local ps
osync ps
# Expected: Show loaded models with VRAM usage, percentage if partially loaded
# Test remote ps
osync ps http://192.168.0.100:11434
# Test with no loaded models
# Expected: "No models currently loaded in memory"Verification:
- VRAM percentage shows when usage < model size
- Table format matches specification
- Context length and expiration time displayed
# Test load model
osync load llama3
osync load qwen2:7b
# Test with custom keep-alive
osync load llama3 --keepalive 30m
osync load llama3 --keepalive 1h
# Test remote load
osync load llama3 http://192.168.0.100:11434
# Verify with ps command
osync ps
# Expected: Model appears in loaded listEdge Cases:
- Model doesn't exist
- Insufficient VRAM
- Already loaded model
# Test unload model
osync unload llama3
osync unload qwen2:7b
# Test remote unload
osync unload llama3 http://192.168.0.100:11434
# Verify with ps command
osync ps
# Expected: Model no longer in loaded listEdge Cases:
- Model not loaded
- Model doesn't exist
Prerequisites:
- Multiple quantizations of same model family (e.g., llama3.2:f16, llama3.2:q4_k_m, llama3.2:q8_0)
- Ollama v0.12.11+ (logprobs support required)
- Sufficient time for 50 questions per quantization (~5-10 minutes each)
Basic Testing:
# Test with minimum required arguments
osync qc -M llama3.2 -Q q4_k_m,q5_k_m
# Expected:
# - Use f16 as base (default)
# - Create llama3.2.qc.json results file
# - Test f16, q4_k_m, q5_k_m (3 quantizations)
# - 50 questions × 3 = 150 total tests
# - Progress bars for each quantization
# - Save after each quantization completes
# Verify results file created
ls llama3.2.qc.json
# Expected: File exists, JSON formatCustom Base Quantization:
# Test with custom base
osync qc -M llama3.2 -Q q4_k_m,q5_k_m -B fp16
# Expected: Use fp16 as base instead of f16Custom Output File:
# Test custom output filename
osync qc -M llama3.2 -Q q4_k_m,q5_k_m -O my-test-results.json
# Expected: Create my-test-results.json instead of defaultRemote Server Testing:
# Test on remote server
osync qc -M llama3.2 -Q q4_k_m,q5_k_m -D http://192.168.0.100:11434
# Expected: Connect to remote, test models thereIncremental Testing:
# Initial test with 2 quantizations
osync qc -M llama3.2 -Q q4_k_m,q5_k_m
# Add more quantizations later
osync qc -M llama3.2 -Q q8_0,q6_k
# Expected:
# - Load existing llama3.2.qc.json
# - Skip f16, q4_k_m, q5_k_m (already tested)
# - Test only q8_0, q6_k (new quantizations)
# - Append results to same fileCustom Test Parameters:
# Test with adjusted parameters
osync qc -M llama3.2 -Q q4_k_m -Te 0.1 -S 42 -To 0.9 -Top 40
# Expected: Run with temperature=0.1, seed=42, top_p=0.9, top_k=40
# Test with penalties
osync qc -M llama3.2 -Q q4_k_m -R 1.1 -F 0.5
# Expected: Apply repeat_penalty=1.1, frequency_penalty=0.5Progress and Status:
# Monitor during execution
osync qc -M llama3.2 -Q q4_k_m,q5_k_m,q8_0
# Expected output:
# - "Using test suite: v1base (50 questions)"
# - "Creating new results file" or "Loaded existing results file (N quantizations tested)"
# - For each quantization:
# - "Testing quantization: llama3.2:q4_k_m"
# - "Preloading model..."
# - Progress bar: "Testing q4_k_m Reasoning 5/10"
# - "✓ q4_k_m complete"
# - "Testing complete!"
# - "Results saved to: llama3.2.qc.json"
# - "View results with: osync qcview llama3.2.qc.json"Edge Cases:
# Model doesn't exist
osync qc -M nonexistent -Q q4,q8
# Expected: Error message for each missing model variant
# Quantization mismatch (different families)
osync qc -M llama3.2 -Q q4_k_m
# Then try: osync qc -M qwen2.5 -Q q5_k_m -O llama3.2.qc.json
# Expected: Error - model family mismatch
# Different parameter sizes
osync qc -M llama3.2 -Q q4_k_m # 3B model
# Then: osync qc -M llama3.2 -Q q8_0 -O llama3.2.qc.json # If 7B variant
# Expected: Error - parameter size mismatch
# Invalid temperature/seed
osync qc -M llama3.2 -Q q4_k_m -Te 2.0
# Expected: Accepts (Ollama will handle validation)
# Empty quants list
osync qc -M llama3.2 -Q ""
# Expected: Error - no quantization tags specifiedValidation Tests:
# Test model metadata validation
# Ensure all quantizations are same family and parameter size
osync qc -M llama3.2:3b -Q f16,q4_k_m,q8_0
# Expected: All pass (same 3B family)
osync qc -M llama3.2:3b -Q f16
# Then manually edit JSON to change family
# Then: osync qc -M llama3.2:3b -Q q4_k_m
# Expected: Error detecting family mismatchInterruption Handling:
# Start test, press Ctrl+C during execution
osync qc -M llama3.2 -Q q4_k_m,q5_k_m,q8_0
# Press Ctrl+C after q4_k_m finishes
# Expected:
# - Results file contains completed quantizations (q4_k_m)
# - Can resume later: osync qc -M llama3.2 -Q q5_k_m,q8_0Tag Format Flexibility (v1.1.7):
# Test with tags without colons (tag portion only)
osync qc -M qwen2 -Q 0.5b-instruct-q8_0,0.5b-instruct-q6_k -B 0.5b-instruct-fp16
# Expected:
# - Constructs full model names: qwen2:0.5b-instruct-fp16, qwen2:0.5b-instruct-q8_0, etc.
# - Uses tag portion for tracking in results file
# - Retrieves actual quantization type from API (details.quantization_level)
# Test with full model names (including colons)
osync qc -M qwen2 -Q qwen2:0.5b-instruct-q8_0,qwen2:0.5b-instruct-q6_k -B qwen2:0.5b-instruct-fp16
# Expected:
# - Uses provided model names as-is
# - Extracts tag portion (after last ":") for tracking
# - Retrieves quantization type from API, not from tag name
# Test mixed format (some with ":", some without)
osync qc -M qwen2 -Q 0.5b-instruct-q8_0,qwen2:0.5b-instruct-q6_k -B 0.5b-instruct-fp16
# Expected: Both formats work correctly in same commandOutput Token Limiting (v1.1.7):
# Test that verbose question responses don't hang
osync qc -M llama3.2 -Q q4_k_m
# Expected:
# - Each question limited to 4096 output tokens
# - No hanging on question 8 or other verbose responses
# - All 50 questions complete successfully
# - Responses truncated at token limit if needed
# Monitor progress through all question categories
# Expected categories in order:
# 1. Reasoning (10 questions)
# 2. Math (10 questions)
# 3. Finance (10 questions)
# 4. Technology (10 questions)
# 5. Science (10 questions)Error Handling and Exit Codes (v1.1.7):
# Test when all models fail to test
osync qc -M nonexistent -Q q4_k_m,q8_0
echo $?
# Expected:
# - Error messages for each model not found
# - Message: "No quantizations were successfully tested - no results file created"
# - Exit code: 1 (failure)
# - No results file created
# Test when some models succeed, some fail
osync qc -M llama3.2 -Q q4_k_m,nonexistent-tag,q8_0
echo $?
# Expected:
# - Success for q4_k_m and q8_0
# - Error for nonexistent-tag
# - Message: "Results saved to: llama3.2.qc.json"
# - Message: "Successfully tested 2 quantization(s)"
# - Exit code: 0 (success, since some results exist)
# - Results file contains only successful testsModel Size Retrieval (v1.1.7):
# Verify model size is correctly retrieved from /api/tags
osync qc -M llama3.2 -Q q4_k_m
osync qcview -F llama3.2.qc.json
# Expected:
# - Size column shows correct disk size (e.g., "2.7 GB")
# - Size matches what 'osync ls' shows for same model
# - No errors about missing 'size' field during testingUnderstanding Scoring Algorithm:
The 4-component scoring system provides comprehensive quality assessment:
1. Logprobs Divergence (70% weight) - Confidence Test:
Purpose: Measures how confident the model is in its token choices
Calculation: 100 × exp(-confidence_difference × 2)
Uses: Sequence-level average confidence (mean logprob)
Range: 0-100%
How it works:
- Calculates average confidence (mean logprob) for each sequence
- Compares base vs quantization confidence levels
- Higher weight because model confidence is the strongest quality indicator
Example:
Base: average logprob -0.5 (good confidence)
Quant: average logprob -0.6 (slightly lower confidence)
→ Small divergence, high score
Base: average logprob -0.5
Quant: average logprob -2.0 (much lower confidence)
→ Large divergence, low score
Interpretation:
95-100%: Quantization maintains very similar confidence levels
85-95%: Slightly less confident in predictions
70-85%: Noticeably different confidence patterns
<70%: Major confidence degradation or uncertainty
2. Perplexity (20% weight) - Overall Confidence Test:
Purpose: Measures model's overall confidence (lower perplexity = more confident)
Calculation: 100 × exp(-0.5 × |1 - perplexity_ratio|)
Perplexity: exp(-average_logprob)
Range: 0-100%
How perplexity works:
- Lower perplexity = model is more confident/certain
- Higher perplexity = model is confused/uncertain
- Ratio compares quant perplexity to base perplexity
Example:
Base perplexity: 1.5 (confident)
Quant perplexity: 1.5 → Ratio: 1.0 → Score: 100%
Base perplexity: 1.5
Quant perplexity: 2.0 → Ratio: 1.33 → Score: 84.6%
Base perplexity: 1.5
Quant perplexity: 5.0 → Ratio: 3.33 → Score: 31.2%
Interpretation:
95-100%: Quantization maintains similar overall confidence
85-95%: Slightly less confident overall
70-85%: Noticeably higher uncertainty
<70%: Major confidence degradation
3. Token Similarity (5% weight) - Sequence Match Test:
Purpose: Measures if quantization produces similar token sequences
Calculation: Uses Longest Common Subsequence (LCS)
Formula: (LCS_length / max_length) × 100
Range: 0-100%
Example:
Base output: "The capital of France is Paris."
Quant output: "The capital of France is Paris."
LCS match: 100% (all tokens in common subsequence)
Base output: "The capital of France is Paris."
Quant output: "The capital of France is Lyon."
LCS match: ~85% (most tokens match in sequence)
Interpretation:
95-100%: Quantization produces virtually identical outputs
85-95%: Minor word choice differences
70-85%: Noticeable differences in phrasing
<70%: Significant divergence in responses
4. Length Consistency (5% weight) - Verbosity Test:
Purpose: Checks if quantization produces similar-length responses
Calculation: 100 × exp(-2 × |1 - length_ratio|)
Range: 0-100%
Example:
Base: 50 tokens
Quant: 50 tokens → Ratio: 1.0 → Score: 100%
Base: 50 tokens
Quant: 45 tokens → Ratio: 0.9 → Score: 81.9%
Base: 50 tokens
Quant: 30 tokens → Ratio: 0.6 → Score: 44.9%
Interpretation:
95-100%: Nearly identical answer lengths
85-95%: 10-20% length variation (usually acceptable)
70-85%: Significant length differences
<70%: Major verbosity changes (much shorter/longer)
Overall Confidence Score:
Weighted average of all four components:
= (Logprobs Divergence × 0.70) +
(Perplexity × 0.20) +
(Token Similarity × 0.05) +
(Length Consistency × 0.05)
Example calculation:
Logprobs Divergence: 88.0% × 0.70 = 61.6%
Perplexity: 90.0% × 0.20 = 18.0%
Token Similarity: 92.5% × 0.05 = 4.6%
Length Consistency: 95.0% × 0.05 = 4.8%
─────────────────────────────────────────
Overall Score: 89.0%
Quality interpretation (color coding):
90-100% (Green): Excellent - minimal quality loss
80-90% (Lime): Very good - acceptable for most uses
70-80% (Yellow): Good - noticeable but manageable degradation
50-70% (Orange): Moderate - quality loss may affect performance
<50% (Red): Poor - significant degradation
Practical Testing Example:
# Test llama3.2 quantizations
osync qc -M llama3.2 -Q q4_k_m,q5_k_m,q8_0 -B f16
# Expected typical results (example):
# f16 (base): 100% overall (by definition)
# q8_0: 95-98% (excellent, minimal loss)
# q5_k_m: 90-94% (very good balance)
# q4_k_m: 85-90% (good, some quality trade-off)
# q2_k: 70-80% (moderate loss, much smaller)
# View results
osync qcview llama3.2.qc.json
# Analyze component scores to understand where quality is lost:
# - High token similarity but low logprobs → same words, less confident
# - Low token similarity but high logprobs → different words, but confident
# - Low length consistency → responses much shorter/longer
# - Low perplexity score → overall uncertainty increasedSerial Judge Mode (Default):
# Local judge model - runs after each quantization
osync qc -M llama3.2 -Q q4_k_m,q8_0 --judge mistral
# Expected:
# - Testing q4_k_m...
# - ✓ Completed q4_k_m
# - Running judgment for: q4_k_m
# - Judging q4_k_m [=== ] 30%
# - ✓ Judging q4_k_m complete
# - Testing q8_0...
# - (continues...)Parallel Judge Mode:
# Parallel mode - judging happens after all testing completes
osync qc -M llama3.2 -Q q4_k_m,q8_0 --judge mistral --mode parallel
# Expected:
# - Testing q4_k_m...
# - Testing q8_0...
# - Running parallel judgment for 2 quantization(s)...
# - Judging all quantizations [===== ] 50%
# - ✓ All judgments completeRemote Judge Model:
# Judge on different server
osync qc -M llama3.2 -Q q4_k_m --judge http://192.168.1.100:11434/mistral
# Expected:
# - Judge model: mistral:latest @ http://192.168.1.100:11434
# - Tests run normally, judgment calls go to remote serverRemote Test Models + Local Judge:
# Test models on remote, judge locally
osync qc -d http://192.168.1.100:11434/ -M llama3.2 -Q q4_k_m --judge mistral
# Expected:
# - Model testing uses remote server
# - Judgment uses local server (localhost:11434)Judge Skip Logic:
# First run with judge
osync qc -M llama3.2 -Q q4_k_m --judge mistral
# Second run - should skip existing judgments
osync qc -M llama3.2 -Q q8_0 --judge mistral
# Expected: q4_k_m judgment skipped (already exists), q8_0 judged
# Force re-judgment
osync qc -M llama3.2 -Q q4_k_m --judge mistral --force
# Expected: q4_k_m re-judged even though judgment exists
# Different judge model - should re-judge
osync qc -M llama3.2 -Q q4_k_m --judge llama3.2
# Expected: All questions re-judged because different modelJudge Results Verification:
# Run with judge
osync qc -M llama3.2 -Q q4_k_m,q8_0 --judge mistral
# View results with judgment
osync qcview llama3.2.qc.json
# Expected output:
# - Header shows "Judge Model: mistral:latest (50% metrics + 50% judgment)"
# - Table shows columns: Final Score, Metrics Score, Judge Score
# - Results sorted by Final Score (not just Metrics)
# JSON output should include judgment data
osync qcview llama3.2.qc.json -Fo json
# Verify JSON contains:
# - "HasJudgmentScoring": true
# - "JudgeModel": "mistral:latest"
# - "AverageJudgmentScore" per quantization
# - "JudgmentScore" per questionPartial Judgment Handling:
# If only some quants have judgment, metrics-only mode is used
# Create results with one judged quant
osync qc -M llama3.2 -Q q4_k_m --judge mistral
# Add another quant without judgment
osync qc -M llama3.2 -Q q8_0
# View results
osync qcview llama3.2.qc.json
# Expected: Uses metrics-only display (no Judge Score column)
# - Because not ALL quants have judgment scoringJudge Model Edge Cases:
# Invalid judge URL format
osync qc -M llama3.2 -Q q4_k_m --judge http://192.168.1.100
# Expected: Warning about invalid format
# Non-existent judge model
osync qc -M llama3.2 -Q q4_k_m --judge nonexistent-model
# Expected: Warning about failed to preload judge model, judgment skipped
# Judge model without tag (should add :latest)
osync qc -M llama3.2 -Q q4_k_m --judge mistral
# Expected: Uses mistral:latestJudge Scoring Verification:
# Run tests and check scoring
osync qc -M llama3.2 -Q q4_k_m,q8_0 --judge mistral
osync qcview llama3.2.qc.json
# Verify scoring behavior:
# - Judge scores: 1-100 per question
# - AverageJudgmentScore: average of all question judge scores
# - FinalScore = (MetricsScore × 0.5) + (JudgeScore × 0.5)
# - Higher quality quants should have higher both metrics AND judge scores
# Check JSON for detailed per-question judgment
osync qcview llama3.2.qc.json -Fo json -O results.json
# Each QuestionResult should have Judgment: { JudgeModel, Score, JudgedAt }Basic Table View (Console):
# View results in table format (file as positional argument)
osync qcview llama3.2.qc.json
# Expected output:
# - Header panel with model info, test suite, options
# - Base model info panel
# - Main results table with columns:
# - Tag, Quant, Size, Overall Score
# - Token Similarity, Logprobs Divergence, Length Consistency, Perplexity
# - Eval Speed, Eval vs Base, Prompt Speed, Prompt vs Base
# - Category breakdown table
# - Color coding: green ≥90%, lime ≥80%, yellow ≥70%, orange ≥50%, red <50%Table Export to File (v1.1.9):
# Export table to text file
osync qcview llama3.2.qc.json -O report.txt
# Expected output:
# - Message: "Table results saved to: report.txt (X.XX KB)"
# - File contains plain text table (no ANSI colors)
# - All columns properly aligned with correct values
# - Eval/Prompt speeds formatted as numbers (not "F1-12" bug)JSON Output:
# View as JSON in console
osync qcview llama3.2.qc.json -Fo json
# Expected:
# - "[yellow]JSON Results:[/]" header
# - JSON output without markup parsing errors (fixed in v1.1.9)
# - Contains: BaseModelName, BaseTag, BaseFamily, etc.
# - QuantScores array with all quantizations
# - CategoryScores for each quantization
# - QuestionScores array (detailed per-question breakdown)
# Export to JSON file
osync qcview llama3.2.qc.json -Fo json -O report.json
# Expected:
# - Message: "JSON results saved to: report.json (X.XX KB)"
# - File contains valid, indented JSONResults Validation:
# Test with valid results file
osync qcview llama3.2.qc.json
# Verify:
# - Overall scores are 0-100%
# - Category scores are 0-100%
# - Performance percentages make sense (faster quants > 100%)
# - Disk sizes match model sizes
# - Quantization types are correct
# - Eval/Prompt speeds show actual numbers (not format strings)
# Test score color coding (v1.1.9 updated thresholds)
# Green (90-100%): Excellent preservation
# Lime (80-90%): Very good
# Yellow (70-80%): Good
# Orange (50-70%): Moderate loss
# Red (<50%): Significant degradationEdge Cases:
# File doesn't exist
osync qcview nonexistent.json
# Expected: "Error: Results file not found: nonexistent.json"
# Invalid JSON file
echo "invalid" > test.json
osync qcview test.json
# Expected: "Error: Failed to parse results file"
# Empty results (no quantizations tested)
# Create results file with empty Results array
osync qcview empty-results.json
# Expected: "No results found in file"
# Missing base quantization
# Edit JSON to remove IsBase flag from all entries
osync qcview no-base.json
# Expected: "Error: No base quantization found in results"
# Invalid format parameter
osync qcview llama3.2.qc.json -Fo invalid
# Expected: Fallback to table format
# JSON to console with special characters (v1.1.9 fix)
osync qcview llama3.2.qc.json -Fo json
# Expected: No "Malformed markup tag" error
# JSON [ and ] characters not interpreted as Spectre.Console markupVerification Tests:
# Compare table vs JSON output
osync qcview llama3.2.qc.json -O table.txt
osync qcview llama3.2.qc.json -Fo json -O output.json
# Manually verify:
# - Scores match between table and JSON
# - All quantizations appear in both
# - Category breakdowns are consistent
# - File sizes displayed correctly in output messagesPerformance Metrics Validation:
# Verify performance percentages
osync qcview llama3.2.qc.json
# Check:
# - Smaller quants (q4_k_m) should have > 100% speed (faster)
# - Larger quants (q8_0) might have < 100% speed (slower)
# - f16/fp16 base should show 100% (baseline)
# - Prompt and eval speeds show actual tok/s values
# - Arrows: ↑ for faster than base, ↓ for slower, = for sameCategory Breakdown Tests:
# Verify category scores
osync qcview llama3.2.qc.json
# Check category breakdown table shows:
# - All 5 categories: Reasoning, Math, Finance, Technology, Science
# - Scores per category for each quantization
# - Consistent with overall confidence score
# - Category scores average to overall score (weighted)Table Formatting (v1.1.9 fixes):
# Test table rendering to console
osync qcview llama3.2.qc.json
# Verify:
# - Columns align properly
# - Numbers formatted correctly (1 decimal place for %)
# - Color codes work in terminal
# - Unicode arrows (↑↓) display for performance
# - Sizes shown in human-readable format (GB, MB)
# Test table export to file
osync qcview llama3.2.qc.json -O report.txt
# Verify:
# - Eval speed shows numbers like "122.8" (not "F1-12")
# - Prompt speed shows numbers like "1609.7" (not "F1-12")
# - All columns correctly formatted with valuesOutput File Tests:
# Table export to file (v1.1.9)
osync qcview llama3.2.qc.json -O report.txt
# Expected: "Table results saved to: report.txt (X.XX KB)"
# Verify file exists and contains properly formatted table
# JSON export to file
osync qcview llama3.2.qc.json -Fo json -O report.json
# Expected: "JSON results saved to: report.json (X.XX KB)"
# Verify file exists and contains valid JSON
# Overwrite existing file
osync qcview llama3.2.qc.json -Fo json -O report.json
# Expected: Overwrites without warning, shows new size
# Invalid output path
osync qcview llama3.2.qc.json -Fo json -O /invalid/path/file.json
# Expected: Error about invalid path or permissions# Test local manage
osync manage
# Test remote manage
osync manage http://192.168.0.100:11434Basic Navigation:
- Up/Down arrows - Navigate model list
- Page Up/Down - Scroll page
- Home/End - Jump to start/end
- Expected: Smooth navigation, selected row highlighted
Test Cases:
- Press
/to start filtering - Type "llama" - Expected: Only llama models shown
- Type more characters - Expected: List updates in real-time
- Press
Esc- Expected: Filter cleared, all models shown - Check top bar shows "Filter: "
Test Cases:
- Press
Ctrl+Orepeatedly - Expected: Cycle through sort modes - Verify top bar updates: Name+, Name-, Size-, Size+, Created-, Created+
- Verify models reorder correctly for each mode
- Verify selected model stays selected during sort
Test Cases:
- Press
Ctrl+Trepeatedly - Expected: Cycle through themes - Verify themes: Default, Dark, Blue, Solarized, Gruvbox, Nord, Dracula
- Verify colors change for: top bar, list rows, selected row, bottom bar
Single Model Copy:
- Select a model
- Press
Ctrl+C - Enter destination name (local copy)
- Expected: Dialog with text field, Enter key works
- Verify copy succeeds
Batch Copy:
- Press
Spaceto select multiple models - Press
Ctrl+C - Enter remote server URL (required for batch)
- Expected: Copy all selected models with progress
- Verify models copied incrementally
- Expected: Return to same model position
Edge Cases:
- Empty destination
- Destination exists
- Network failures (remote)
Test Cases:
- Select a model
- Press
Ctrl+M - Enter new name
- Press
Enter - Expected: Model renamed, cursor stays on renamed model
Edge Cases:
- Empty new name
- Destination exists
- Source doesn't exist (deleted meanwhile)
Test Cases:
- Select a model
- Press
Ctrl+R - Expected: Exit TUI, show console, preload model, enter chat
- Type message, verify response
- Type
/byeor Ctrl+D - Expected: Return to manage TUI, same model selected
Test Cases:
- Select a model
- Press
Ctrl+S - Expected: Exit TUI, show model info in console
- Press any key
- Expected: Return to manage TUI, same model selected
Single Delete:
- Select a model
- Press
Ctrl+D - Confirm deletion
- Expected: Model deleted, cursor moves to next/previous model
Batch Delete:
- Press
Spaceto select multiple models - Press
Ctrl+D - Confirm deletion
- Expected: All selected models deleted
Edge Cases:
- Delete last model in list
- Delete first model in list
- Cancel confirmation
Single Update:
- Select a model
- Press
Ctrl+U - Expected: Exit TUI, show console with update progress
- Verify "updated successfully" or "already up to date"
- Press any key
- Expected: Return to manage TUI, same model selected
Batch Update:
- Press
Spaceto select multiple models - Press
Ctrl+U - Expected: Update all selected models sequentially
- Verify each shows correct status
Test Cases:
- Press
Ctrl+P - Enter model name (e.g., "llama3")
- Press
Enter - Expected: Validate model exists on ollama.com
- If valid: Exit TUI, show console with pull progress
- If invalid: Error dialog
- Press any key to return
Edge Cases:
- Non-existent model
- Model already exists
- Network failures
Test Cases:
- Select a model
- Press
Ctrl+L - Expected: Load model into memory
- Verify with
Ctrl+X(ps)
Test Cases:
- Select a loaded model
- Press
Ctrl+K - Expected: Unload model from memory
- Verify with
Ctrl+X(ps)
Test Cases:
- Load a model (Ctrl+L)
- Press
Ctrl+X - Expected: Dialog showing loaded models in table format
- Verify VRAM percentage shows when partially loaded
- Verify matches CLI
osync psformat - Press Close or Esc
Test Cases:
- Press
Spaceon multiple models - Verify
[X]checkbox appears - Press
Spaceagain to deselect - Verify
[ ]checkbox appears - Test batch operations (copy, delete, update) with selection
Test Cases:
- Press
Ctrl+Q- Expected: Confirmation dialog - Select "No" - Expected: Stay in manage
- Press
Ctrl+Qagain, select "Yes" - Expected: Exit cleanly - Same tests with
Esckey
# Launch REPL
osync
# Test tab completion
> <Tab>
# Expected: Show available models
> cp <Tab>
# Expected: Show available models
# Test command history
> ls
> cp llama3 backup
> <Up arrow>
# Expected: Show previous command
# Test exit
> exit
# Expected: Exit cleanly# Test with 10GB+ model
osync cp large-model http://remote:11434
# Monitor: Progress accuracy, speed calculation, memory usage# In manage TUI: Select 10+ models
# Copy to remote server
# Monitor: Progress, memory usage, completion time# Start large transfer
# Disconnect network briefly
# Expected: Error handling, no corruptionRun after any code changes:
-
Basic Workflow:
osync ls osync cp test-model test-backup osync rename test-backup test-renamed osync rm test-renamed
-
Manage TUI Workflow:
osync manage # Navigate, filter, sort, select, copy, delete # Cycle themes, check all keyboard shortcuts
-
Remote Operations:
osync ls http://remote:11434 osync cp local-model http://remote:11434 osync update remote-model http://remote:11434
-
Model not found:
osync cp non-existent-model backup # Expected: Clear error message -
Server unreachable:
osync ls http://invalid-server:11434 # Expected: Connection error message -
Insufficient permissions:
osync rm system-model # Expected: Permission error if applicable -
Invalid patterns:
osync ls "[" # Expected: Pattern error or no matches
- Test with PowerShell and CMD
- Verify path handling with backslashes
- Test special characters in model names
- Test with bash and zsh
- Verify file permissions
- Test line endings (CRLF vs LF)
- Test on ARM and x64
- Verify terminal compatibility
- Test with iTerm2 and Terminal.app
#!/bin/bash
# Basic osync test suite
echo "Testing list command..."
osync ls || exit 1
echo "Testing copy command..."
osync cp test-model test-backup || exit 1
echo "Testing rename command..."
osync rename test-backup test-renamed || exit 1
echo "Testing delete command..."
osync rm test-renamed || exit 1
echo "All tests passed!"When reporting issues, include:
- Command used: Exact command that failed
- Expected behavior: What should happen
- Actual behavior: What actually happened
- Environment:
- OS and version
- .NET version
- Ollama version
- osync version
- Steps to reproduce: Minimal test case
- Logs/Screenshots: Any relevant output
-
Remote-to-Remote Copy:
- Only works with registry models
- Locally created models cannot be transferred
- Requires registry.ollama.ai access
-
TUI Mode:
- Requires terminal size ≥ 80x24
- Colors may vary across terminals
- SSH sessions may have display issues
-
Pattern Matching:
- Only
*wildcard supported - No regex support
- Case-sensitive matching
- Only
- ls (list)
- cp (copy)
- rename/mv/ren
- rm/delete/del
- update
- show
- pull
- run/chat
- ps (process status)
- load
- unload
- qc (quantization comparison)
- qcview (view comparison results)
- manage
- Pattern matching
- Sorting (name, size, time)
- Filtering (live search)
- Batch operations
- Theme switching
- Progress tracking
- Bandwidth throttling
- Remote operations
- Memory management
- Quantization quality testing
- Logprobs analysis
- Incremental testing
- Tab completion
- Command history
Recommended testing schedule:
- Before commit: Basic regression tests
- Before release: Full test suite
- After deployment: Smoke tests on production
- Weekly: Performance benchmarks
- Monthly: Security audit
When adding new features:
- Add test cases to this document
- Test on all supported platforms
- Document edge cases
- Update regression test suite
- Verify backward compatibility