feat: 7-step wizard, experiment tracking, deploy tab, API package refactor#44
Conversation
…efactor
## New features
- 7-step fine-tuning wizard (model → intent → data → configure → train → results → deploy)
- Experiment tracking with SQLite persistence (app/state/experiment_state.py)
- Deploy tab: adapter download, HF Hub push, merged model, GGUF export, GitHub push
- Merge + GGUF export workers (trainer/merge.py, workers/merge_task.py)
- Synthetic dataset generation and HuggingFace Hub dataset preview in wizard step 3
- AI commentary on training epochs via /api/jobs/{id}/commentary
## Refactoring
- Split 783-line app/api.py monolith into app/api/ package (schemas, deps, system, models, datasets, jobs, experiments)
- Add app/components/finetune/ step components for wizard steps 1–4
- Fix rx.Base crash: ExperimentRun now uses pydantic.BaseModel (rx.Base removed in current Reflex)
- Fix experiment DB path: absolute path from __file__ instead of relative ./storage
- Fix rxconfig.py SitemapPlugin startup warning
## Docs
- README: add "What's New" section, Experiment Tracking and Model Deployment capability rows
Closes #13, #12, #11
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 26 minutes and 36 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (25)
📝 WalkthroughWalkthroughTuneOS transitions from a monolithic 5-step finetune page to a modular 7-step wizard architecture with REST APIs, SQLite experiment tracking, and enhanced deployment. The backend is refactored from a single ChangesComplete TuneOS Modernization
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
pyproject.toml had diverged from the lock file, causing all CI jobs to fail at the dependency install step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename loop variable i → _i (B007) - Add strict=False to zip() call (B905, auto-fix) - Sort/organise import blocks (I001, auto-fix) - Remove unused import (F401, auto-fix) - Apply ruff format across 19 files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_api: dataset_path is now optional (hub dataset jobs omit it); rename test to reflect correct expected behaviour (201 not 422) - test_workers: finetune() returns (output_path, model, tokenizer); fix mock return_value to be a 3-tuple so unpacking doesn't raise Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Docker build was failing because the root Dockerfile referenced hf_spaces/nginx.conf and hf_spaces/entrypoint.sh which didn't exist. - Add hf_spaces/nginx.conf: nginx proxies port 7860 → Reflex 3000 / API 8000 - Add hf_spaces/entrypoint.sh: full-stack boot (Redis, Celery, Reflex, nginx) test_evaluate.py was failing because only 'torch' was mocked but trainer/evaluate.py imports from torch.utils.data. Python's import machinery requires all submodule names to be present in sys.modules when the parent is a MagicMock (not a real package). - Add torch.utils, torch.utils.data, torch.nn, torch.cuda to mock chain All 41 tests pass (excluding test_pyqt.py which requires a display). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (24)
app/api/deps.py-28-33 (1)
28-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't collapse backend failures into
"unknown".Line 32 turns Redis/import failures into the same payload as a genuinely missing job. That makes UI error handling and operator debugging much harder, especially for the clear-error-surfacing objective.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/deps.py` around lines 28 - 33, The current _get_job_status_from_redis function swallows all exceptions and returns the same {"status": "unknown", "job_id": job_id} for import/Redis failures and for genuinely missing jobs; change it so backend/import/Redis exceptions are surfaced separately by catching only the "job not found" case (or checking for None) and returning {"status":"unknown","job_id":job_id} only when the job truly doesn't exist, while other exceptions (ImportError, redis.ConnectionError, etc.) should be caught and returned as an error payload such as {"status":"error","job_id":job_id,"error": str(e)} or re-raised; locate the logic in _get_job_status_from_redis and references to get_job_status to implement this distinction and include the exception message for operator debugging.app/api/datasets_routes.py-48-57 (1)
48-57:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle datasets that don’t define a
trainsplit
Hardcodingsplit="train[:5]"can raise for Hub datasets that only expose other splits (e.g.,validation/test). Before callingload_dataset, list available split names withdatasets.get_dataset_split_names(...)for the dataset/config and load a split that exists (prefertrainif present, otherwise fall back to the first available), then apply[:5]to that chosen split.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/datasets_routes.py` around lines 48 - 57, The _load function currently hardcodes split="train[:5]" which fails for datasets without a train split; update _load to first call datasets.get_dataset_split_names(dataset_id, config_name=...) (or get_dataset_config_names if needed) to retrieve available splits, pick "train" if present otherwise the first available split, then call load_dataset with f"{chosen_split}[:5]" (keep trust_remote_code=True) to load only 5 rows; keep the rest of the logic that builds rows and ensure the outer await asyncio.get_event_loop().run_in_executor(None, _load) still calls the updated _load.app/api/deps.py-11-13 (1)
11-13:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake shared storage paths absolute.
OUTPUT_DIRandDATASET_DIRare resolved relative to the current working directory. The API process and background workers can start from different directories, so the same run can write artifacts to one location and later look them up in another.Suggested fix
+from pathlib import Path + +_BASE_DIR = Path(__file__).resolve().parents[2] + REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379/0") -OUTPUT_DIR = os.getenv("OUTPUT_DIR", "./outputs") -DATASET_DIR = os.getenv("DATASET_DIR", "./storage/datasets") +OUTPUT_DIR = str(Path(os.getenv("OUTPUT_DIR", str(_BASE_DIR / "outputs"))).resolve()) +DATASET_DIR = str(Path(os.getenv("DATASET_DIR", str(_BASE_DIR / "storage" / "datasets"))).resolve())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/deps.py` around lines 11 - 13, OUTPUT_DIR and DATASET_DIR are currently relative and can differ between processes; change their resolution to absolute paths by converting the environment values to absolute (e.g., use os.path.abspath or Path(...).resolve()) when assigning OUTPUT_DIR and DATASET_DIR so both the API and workers reference the same filesystem location; update the assignments that set OUTPUT_DIR and DATASET_DIR to wrap the os.getenv result with the absolute-resolution call and optionally ensure directories exist (mkdir(parents=True, exist_ok=True)) after resolution.app/api/datasets_routes.py-71-76 (1)
71-76:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSurface self-instruct failures instead of silently templating.
When
_self_instruct_generatefails here, the route quietly switches to_template_generateand still returns success. Users who explicitly chose HF-backed generation will get a different dataset than requested with no warning, and token/model misconfigurations stay hidden.Suggested fix
if hf_token and req.method in ("self_instruct", "few_shot"): try: samples = _self_instruct_generate(req.user_intent, req.n_samples, req.seed_examples, hf_token) - except Exception: - samples = [] + except Exception as exc: + return { + "samples": [], + "dataset_path": "", + "stats": { + "total_generated": 0, + "final_count": 0, + "diversity_score": 0.0, + }, + "warning": f"{req.method} generation failed: {exc}", + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/datasets_routes.py` around lines 71 - 76, The route currently swallows errors from _self_instruct_generate and silently falls back to _template_generate; change this so failures are surfaced: when hf_token is present and req.method is "self_instruct" or "few_shot", catch exceptions from _self_instruct_generate, log the exception (including error details) and return a proper error response (or re-raise) instead of setting samples=[] and continuing to call _template_generate; update the datasets route handler around the hf_token branch to propagate the error to the client so token/model misconfigurations are visible and no silent fallback occurs.trainer/config.py-11-13 (1)
11-13: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winValidate
model_sourceearly.This is a free-form string today, so a typo can silently fall through to the wrong loading path downstream. Please fail fast on unsupported values instead of relying on the inline comment for enforcement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@trainer/config.py` around lines 11 - 13, Add early validation for the model_source config so typos fail fast: in trainer/config.py replace or augment the free-form model_source field (model_source) by enforcing allowed values ("hub", "local", "custom_string") — e.g., convert model_source to an Enum or add a __post_init__/validator on the config class that checks model_source and raises ValueError on unsupported values; ensure hf_token and local_model_path remain unchanged but downstream consumers now can assume model_source is one of the allowed options.trainer/loader.py-13-17 (1)
13-17:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject nonexistent local model paths.
When
model_source == "local"and the path is wrong, Line 28 flipslocal_files_onlytoFalse, sofrom_pretrainedstarts Hub resolution instead of failing locally. That turns a simple path bug into a misleading load error.Suggested guard
def _resolve_model_path(cfg: ModelConfig) -> str: """Return the model identifier to pass to from_pretrained.""" if cfg.model_source == "local" and cfg.local_model_path: return cfg.local_model_path return cfg.model_name def load_model_and_tokenizer(cfg: ModelConfig): @@ model_path = _resolve_model_path(cfg) token = cfg.hf_token or os.getenv("HF_TOKEN") or None - local_only = cfg.model_source == "local" and os.path.exists(model_path) + if cfg.model_source == "local" and not os.path.exists(model_path): + raise FileNotFoundError(f"Local model path does not exist: {model_path}") + local_only = cfg.model_source == "local"Also applies to: 28-28
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@trainer/loader.py` around lines 13 - 17, The _resolve_model_path function must reject nonexistent local paths instead of returning model_name; when ModelConfig.model_source == "local" and cfg.local_model_path is provided, verify the path exists (and is a directory/file as expected) and raise a clear ValueError if it does not; keep returning cfg.local_model_path only when the path is valid so downstream logic (e.g., setting local_files_only) won’t flip to remote Hub resolution unexpectedly.trainer/merge.py-90-100 (1)
90-100:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not silently downgrade a quantized export to f16.
If the quantizer binary is missing, this returns
model-f16.ggufand the caller still records the request as a successfulquant_typeexport. That breaks the deploy contract and can produce a much larger artifact than the UI asked for.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@trainer/merge.py` around lines 90 - 100, The code currently falls back to returning f16_path when no quantizer binary is found, which silently misreports a quantized export; change the behavior in the block that checks quantize_bin (symbols: quantize_bin, shutil.which("llama-quantize")/shutil.which("quantize"), f16_path, gguf_path, quant_type, subprocess.run) so that if quantize_bin is None you raise an explicit exception (e.g., RuntimeError or a custom ExportError) explaining the quantizer is missing and include quant_type in the message, rather than returning f16_path — this ensures the caller sees a failed quantization and can handle/report it instead of recording a successful quantized export.trainer/dataset.py-45-49 (1)
45-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when the mapped dataset columns are missing.
If
instruction_coloroutput_colis wrong, Lines 46-49 skip renaming andformat_promptfalls back to empty strings. That silently trains on blank prompts/responses instead of surfacing a bad mapping.Suggested guard
# Normalise column names so format_prompt always sees "instruction" / "output" + if instruction_col not in raw.column_names: + raise ValueError(f"Instruction column '{instruction_col}' not found in dataset") + if output_col not in raw.column_names: + raise ValueError(f"Output column '{output_col}' not found in dataset") + if instruction_col != "instruction" and instruction_col in raw.column_names: raw = raw.rename_column(instruction_col, "instruction") if output_col != "output" and output_col in raw.column_names: raw = raw.rename_column(output_col, "output")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@trainer/dataset.py` around lines 45 - 49, The rename block silently ignores missing mapped columns causing format_prompt to see empty instruction/output; instead validate presence of instruction_col and output_col on raw.column_names and raise a clear error if either is missing before calling raw.rename_column (referencing variables instruction_col, output_col and method raw.rename_column). Ensure you only attempt raw.rename_column when the column exists, but if a provided mapping is not found throw an exception with a message identifying the missing column(s) so callers must fix the mapping.trainer/callbacks.py-49-57 (1)
49-57:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake Redis publishing best-effort.
A Redis outage here will raise out of the callback and can fail the whole fine-tune even though the model training itself is still healthy. Progress streaming should not sit on the critical path for job execution.
Suggested hardening
- self.redis.publish(self.channel, json.dumps(payload)) + try: + self.redis.publish(self.channel, json.dumps(payload)) + except Exception: + pass @@ - self.redis.publish(self.channel, json.dumps(payload)) + try: + self.redis.publish(self.channel, json.dumps(payload)) + except Exception: + pass🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@trainer/callbacks.py` around lines 49 - 57, Wrap Redis publishing in on_train_end (and similarly where self.redis.publish is used) in a safe try/except so a Redis outage won’t propagate and fail the training; catch broad exceptions around self.redis.publish(self.channel, json.dumps(payload)), log the error via the existing logger (or processLogger) with context (e.g., "failed to publish training status to Redis" and include exception info) and continue without re-raising; ensure the same pattern is applied to other publish calls (e.g., in the method that sets self._start_time and any progress publishes) so publishing is best-effort only.app/pages/datasets.py-279-291 (1)
279-291:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake the category sidebar actually filter datasets.
Clicking a category only changes the active styling.
selected_categoryis never used when renderingSTARTER_DATASETSor when building the search request, so the sidebar is a no-op.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/pages/datasets.py` around lines 279 - 291, The category sidebar only toggles styling because selected_category is never used and on_click currently invokes DatasetState.set_category immediately; update _category_item to pass a callable (e.g., on_click=lambda *_: DatasetState.set_category(label)) so clicking sets state at click time, then use DatasetState.selected_category when rendering STARTER_DATASETS (filter the list by matching category) and include selected_category in the search request parameters where you build requests (e.g., the function that constructs/searches datasets such as build_search_request or fetch_datasets) so the UI actually filters results.app/api/jobs_routes.py-78-89 (1)
78-89:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDo not serialize HF/GitHub tokens into Celery task kwargs.
Passing tokens in the task payload stores them in the broker/backend and often exposes them to task inspection and logs. That unnecessarily broadens secret access beyond the API process.
Also applies to: 179-184, 238-243
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/jobs_routes.py` around lines 78 - 89, The task payload currently serializes sensitive HF/GitHub tokens into run_finetune.apply_async kwargs (and likewise at the other occurrences), which risks exposing secrets in the broker/backend; remove any token fields from the kwargs passed to run_finetune.apply_async and instead have the worker retrieve credentials securely at runtime (e.g., fetch from a secrets store, environment variable, or a secure DB keyed by job_id) inside the run_finetune task handler, using job_id (and existing model_cfg/lora_cfg/train_cfg identifiers) to look up any per-job secrets; update the other apply_async calls you referenced (the ones at the other occurrences) to follow the same pattern so tokens are never serialized in Celery messages.app/state/experiment_state.py-53-70 (1)
53-70:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersist
loss_historythrough reloads.
save_experiment_run()writesloss_history, but the restored model drops it entirely, so reloaded experiments lose the epoch/loss series this feature is supposed to preserve.Suggested fix
class ExperimentRun(BaseModel): id: str = "" name: str = "" model_id: str = "" model_source: str = "hub" technique: str = "qlora" epochs: int = 3 learning_rate: str = "2e-4" lora_r: int = 16 batch_size: int = 4 dataset_name: str = "" user_intent: str = "" final_loss: float = 0.0 perplexity: float = 0.0 started_at: str = "" finished_at: str = "" status: str = "unknown" output_path: str = "" + loss_history: list[dict[str, Any]] = [] @@ ExperimentRun( id=r["id"], name=r["name"] or "", model_id=r["model_id"] or "", model_source=r["model_source"] or "hub", technique=r["technique"] or "qlora", epochs=r["epochs"] or 3, learning_rate=r["learning_rate"] or "2e-4", lora_r=r["lora_r"] or 16, batch_size=r["batch_size"] or 4, dataset_name=r["dataset_name"] or "", user_intent=r["user_intent"] or "", final_loss=r["final_loss"] or 0.0, perplexity=r["perplexity"] or 0.0, started_at=r["started_at"] or "", finished_at=r["finished_at"] or "", status=r["status"] or "unknown", output_path=r["output_path"] or "", + loss_history=json.loads(r["loss_history"] or "[]"), )Also applies to: 94-113
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/experiment_state.py` around lines 53 - 70, The ExperimentRun model is missing a persisted loss_history so reloads drop epoch/loss series; add a loss_history field (e.g., List[float] or List[dict] depending on how epochs are represented) to the ExperimentRun Pydantic model and to the other run model defined later, then update save_experiment_run and the corresponding load/restore routine to include loss_history in serialization and deserialization so saved runs restore their epoch/loss series intact.app/api/jobs_routes.py-127-136 (1)
127-136:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAvoid in-memory ZIP assembly for model artifacts.
BytesIOforces the entire archive to live in RAM before the response starts. That is especially risky for/download-merged, where the artifact can be multi-GB and take down the API worker.Also applies to: 199-208
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/jobs_routes.py` around lines 127 - 136, The current code builds the ZIP in-memory using io.BytesIO (buf) and zipfile.ZipFile which risks OOM for multi-GB artifacts; change to produce the ZIP on-disk or stream it incrementally instead: create a temporary file (e.g., tempfile.NamedTemporaryFile) or use a streaming zip iterator and open ZipFile writing to that file/stream while yielding chunks to the client, then return a StreamingResponse or FileResponse that streams the temporary ZIP with the same headers; update the adapter download path that currently references adapter_dir, job_id and zipfile.ZipFile to use the on-disk/temp-file or chunked generator approach and ensure the temp file is cleaned up after streaming.app/state/experiment_state.py-139-172 (1)
139-172:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not silently ignore failed experiment writes.
If SQLite or JSON serialization fails here, the caller still proceeds as if the run was saved. That turns DB issues into silent data loss for a core persistence path.
Suggested fix
def save_experiment_run(run_data: dict[str, Any]): """Called from FinetuneState._save_experiment_record() — writes to SQLite.""" - try: - _init_db() - with _get_conn() as conn: - conn.execute( - """ - INSERT OR REPLACE INTO runs - (id, name, model_id, model_source, technique, epochs, learning_rate, - lora_r, batch_size, dataset_name, user_intent, final_loss, perplexity, - started_at, finished_at, status, output_path, loss_history) - VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) - """, - ( - run_data.get("id", ""), - run_data.get("name", ""), - run_data.get("model_id", ""), - run_data.get("model_source", "hub"), - run_data.get("technique", "qlora"), - run_data.get("epochs", 3), - run_data.get("learning_rate", "2e-4"), - run_data.get("lora_r", 16), - run_data.get("batch_size", 4), - run_data.get("dataset_name", ""), - run_data.get("user_intent", ""), - run_data.get("final_loss", 0.0), - run_data.get("perplexity", 0.0), - run_data.get("started_at", ""), - run_data.get("finished_at", ""), - run_data.get("status", "unknown"), - run_data.get("output_path", ""), - json.dumps(run_data.get("loss_history", [])), - ), - ) - except Exception: - pass + _init_db() + with _get_conn() as conn: + conn.execute( + """ + INSERT OR REPLACE INTO runs + (id, name, model_id, model_source, technique, epochs, learning_rate, + lora_r, batch_size, dataset_name, user_intent, final_loss, perplexity, + started_at, finished_at, status, output_path, loss_history) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + run_data.get("id", ""), + run_data.get("name", ""), + run_data.get("model_id", ""), + run_data.get("model_source", "hub"), + run_data.get("technique", "qlora"), + run_data.get("epochs", 3), + run_data.get("learning_rate", "2e-4"), + run_data.get("lora_r", 16), + run_data.get("batch_size", 4), + run_data.get("dataset_name", ""), + run_data.get("user_intent", ""), + run_data.get("final_loss", 0.0), + run_data.get("perplexity", 0.0), + run_data.get("started_at", ""), + run_data.get("finished_at", ""), + run_data.get("status", "unknown"), + run_data.get("output_path", ""), + json.dumps(run_data.get("loss_history", [])), + ), + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/experiment_state.py` around lines 139 - 172, The current try/except around the DB insert (using _init_db(), _get_conn(), and run_data into the runs table) swallows all exceptions causing silent data loss; change it to catch Exception as e, log the error (including e and identifying run_data.get("id")/name) using the module logger, and then propagate the failure to the caller (re-raise the exception or return a failure status) so callers do not proceed as if the run was saved; keep the insert logic and JSON serialization for loss_history but do not suppress errors silently.app/api/experiments_routes.py-13-15 (1)
13-15:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the same DB path source as
ExperimentState.This fallback points to
./storage/experiments.db, whileapp/state/experiment_state.pywrites to a project-root absolute path. If the API starts from a different working directory,/experimentswill query and delete from a different database.Suggested fix
-def _get_db_path() -> str: - return os.getenv("EXPERIMENT_DB", "./storage/experiments.db") +def _get_db_path() -> str: + from app.state.experiment_state import DB_PATH + return DB_PATH🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/experiments_routes.py` around lines 13 - 15, The _get_db_path function currently falls back to a relative "./storage/experiments.db" which differs from ExperimentState's absolute project-root path; update _get_db_path to obtain the DB path from the same source as ExperimentState (e.g., import ExperimentState and call its public accessor or constant such as ExperimentState.get_db_path() or ExperimentState.DB_PATH) so both API routes and ExperimentState target the identical absolute path instead of a relative fallback.app/pages/finetune.py-1137-1173 (1)
1137-1173:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThis table is showing all completed runs, not this setup's runs.
ExperimentState.completed_runsonly filters onstatus == "done"inapp/state/experiment_state.py:84-85. The "Past runs with this setup" table therefore mixes unrelated models, datasets, and techniques, which makes the comparison misleading. Filter against the current run configuration before rendering it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/pages/finetune.py` around lines 1137 - 1173, The table currently iterates over ExperimentState.completed_runs (which only filters by status) and thus shows unrelated runs; change the rendering to first compute a filtered list of completed runs that match the current run's setup (compare identifying fields on ExperimentState.current_run such as model_id, dataset_id, and any setup_hash/technique/hyperparams) and pass that filtered list to rx.foreach instead of ExperimentState.completed_runs; e.g., produce a filtered_runs = ExperimentState.completed_runs.filter(lambda r: r.model_id == ExperimentState.current_run.model_id and r.dataset_id == ExperimentState.current_run.dataset_id [and r.setup_hash == ExperimentState.current_run.setup_hash or compare technique/hyperparams as needed]) and use filtered_runs in the rx.foreach call that builds the table rows.app/pages/finetune.py-1107-1115 (1)
1107-1115:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix
on_key_downto not block typing on non-Enter keys
rx.prevent_defaultis executed for every non-"Enter"key, which cancels the browser’s default keydown behavior and can prevent characters from being inserted into the input. Reflex’son_key_downhandler receives the pressed key as a string, so keep the"Enter"check but applyrx.prevent_defaultonly in the Enter branch (or remove it entirely) and let other keys fall through.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/pages/finetune.py` around lines 1107 - 1115, The on_key_down handler on the rx.input is calling rx.prevent_default for every non-"Enter" key which blocks normal typing; update the lambda so rx.prevent_default is only invoked when the key is "Enter" (or remove it entirely) and let other keys fall through to normal behavior; locate the rx.input using FinetuneState.chat_input, FinetuneState.set_chat_input and FinetuneState.send_test_chat and adjust the on_key_down conditional so the Enter branch runs send_test_chat (and optionally prevent_default) while the non-Enter branch does nothing.app/state/finetune_state.py-717-725 (1)
717-725:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSave the experiment after auto-eval finishes, not before it starts.
This persists the run before
_auto_eval()populatesself.eval_perplexity, so successful runs are stored with a placeholder perplexity of0.0.Also applies to: 766-787
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/finetune_state.py` around lines 717 - 725, The experiment record is being saved before `_auto_eval()` runs so `self.eval_perplexity` remains at its placeholder; update the flow in the block that checks `self.training_status == "done"` (and the similar block around the second occurrence) to run `await self._auto_eval()` first, wait for it to complete and populate `self.eval_perplexity`, then call `await self._save_experiment_record()`; keep the `async with self:` that sets `self.current_step = 6` as needed but ensure the final save happens after `_auto_eval()` finishes so persisted records contain the real perplexity.app/state/finetune_state.py-333-344 (1)
333-344:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize Hub preview rows to the shared table schema before storing them.
_preview_tableonly readsinstructionandoutput, but this code keeps the API's raw preview rows. Datasets with different column names will render fine in/datasetsand then break here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/finetune_state.py` around lines 333 - 344, The hub preview rows are stored raw but must be normalized to the shared table schema so _preview_table (which expects "instruction" and "output") won't break; after parsing data in the block that sets self.hub_dataset_columns and before assigning self.hub_dataset_preview, map each row (using the returned columns list from data.get("columns", [])) into a normalized dict with keys "instruction" and "output" by locating the column indices for the detected instruction/output names (or falling back to self.hub_dataset_instruction_col/self.hub_dataset_output_col) and building rows like {"instruction": row[idx_inst], "output": row[idx_out]} (or empty string if missing); then assign that normalized list to self.hub_dataset_preview and continue setting self.hub_dataset_instruction_col and self.hub_dataset_output_col as now.app/state/finetune_state.py-418-435 (1)
418-435:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersist the detected upload column mapping into the training payload.
The preview remaps arbitrary upload columns to
instruction/output, but the job request still sends the Hub column fields. That means a local CSV/JSON upload with columns likepromptandanswerwill preview fine here and then fail oncetrainer/finetune.pytries to readinstruction_col/output_col.Also applies to: 602-606
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/finetune_state.py` around lines 418 - 435, The preview remapping detects which upload columns map to instruction/output but doesn't persist that mapping into the training job payload; update the state to store the detected names (e.g., set properties like self.instruction_col and self.output_col or equivalent fields used when building the job request) in the same block that computes inst_col/out_col so the trainer uses those values; also apply the same persistence change in the later block around the other preview code (lines referenced in the review) so both CSV/JSON upload paths include the mapped column names when constructing the payload sent to trainer/finetune.py.app/state/finetune_state.py-293-295 (1)
293-295:⚠️ Potential issue | 🟠 Major | ⚡ Quick winZIP uploads need to be extracted before treating the directory as a model path.
For
.zipfiles this stores the archive, then pointslocal_model_pathat the parent directory without unpacking it. The training path will not contain model files yet.Proposed fix
+ import zipfile + - self.local_model_path = dest_dir if file.filename.endswith(".zip") else dest_path + if file.filename.endswith(".zip"): + with zipfile.ZipFile(dest_path) as archive: + archive.extractall(dest_dir) + self.local_model_path = dest_dir + else: + self.local_model_path = dest_path🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/finetune_state.py` around lines 293 - 295, The current assignment in FinetuneState (setting self.local_model_path = dest_dir if file.filename.endswith(".zip") else dest_path) points to the ZIP archive's parent directory without extracting it; update the upload handling so that when file.filename endswith(".zip") you extract the ZIP into dest_dir (e.g., using Python's zipfile) and then set self.local_model_path to the extraction directory (not the zip or its parent), keeping self.model_source = "local" and self.is_validating_model = False; modify the logic around dest_dir/dest_path and the ZIP branch in the method that sets local_model_path to ensure training receives the extracted model files.app/state/finetune_state.py-578-635 (1)
578-635:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
start_trainingagainst duplicate submissions.There is no re-entry check here, so a fast double-click can POST multiple jobs before the UI reflects
is_starting.Proposed fix
async def start_training(self): - if not self.can_start_training: + if self.is_starting or self.training_status == "running" or not self.can_start_training: return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/finetune_state.py` around lines 578 - 635, The start_training method can be re-entered before the UI/state updates, so add an early atomic re-entry guard: inside start_training, acquire the existing instance lock (async with self) immediately, check if self.is_starting or self.training_status == "running" (or !self.can_start_training) and return if true, then set self.is_starting = True and other initial fields (experiment_id/name, training_start_time, etc.) while still holding the lock to prevent a race; release the lock and proceed to build payload and POST. Also ensure all exception paths reset self.is_starting = False (and set start_error) inside the lock so a failed attempt doesn't permanently block future starts. Use the existing symbols start_training, is_starting, can_start_training, training_status, and the async with self lock to implement this.app/state/finetune_state.py-660-699 (1)
660-699:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle terminal pubsub messages before appending loss samples.
The Redis
donepayload does not includelossorepoch, but this loop still appends a zero-valued point before breaking. That will reset the visible chart tail and poison the savedfinal_loss.Proposed fix
- current_loss = data.get("loss", 0) - current_epoch = data.get("epoch", 0) + status = data.get("status") + if status in ("done", "failed"): + async with self: + self.training_status = status + break + + if "loss" not in data: + continue + + current_loss = data["loss"] + current_epoch = data.get("epoch", 0) @@ - if data.get("status") in ("done", "failed"): - async with self: - self.training_status = data.get("status", "done") - break🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/state/finetune_state.py` around lines 660 - 699, The loop currently computes current_loss/current_epoch and immediately appends to self.loss_history (see variables current_loss, current_epoch and the append into self.loss_history) even when the pubsub payload is a terminal message with status "done" or "failed", which causes a bogus zero-valued sample; move the terminal-status check (data.get("status") in ("done","failed")) to occur before computing/appending loss samples and before updating epoch/epoch_start_loss so that when a terminal message arrives you only set self.training_status (via async with self: self.training_status = ...) and break without mutating loss_history, epoch_start_loss, prev_epoch, or other training metrics. Ensure the rest of the epoch-boundary logic still runs for non-terminal messages.app/components/finetune/step4_configure.py-35-72 (1)
35-72:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHide the simple form when Advanced mode is enabled.
The toggle currently only adds the advanced card; it never removes the simple one, so advanced mode renders both control sets at once.
Proposed fix
- # Simple mode - _card( - rx.vstack( - rx.grid( - rx.vstack( - _label("Epochs"), - rx.input(value=FinetuneState.epochs.to_string(), - on_change=FinetuneState.set_epochs, - type="number", width="100%"), - rx.text("One full pass through your dataset", - font_size="0.72rem", color=c("text_muted")), - spacing="1", - ), - rx.vstack( - _label("Learning rate"), - rx.select.root( - rx.select.trigger(width="100%"), - rx.select.content( - *[rx.select.item(f"{lr} — {desc}", value=lr) - for lr, desc in _LR_PRESETS], - ), - value=FinetuneState.learning_rate, - on_change=FinetuneState.set_learning_rate, - ), - spacing="1", - ), - rx.vstack( - _label("Technique"), - rx.text(FinetuneState.technique_label, font_size="0.88rem", - font_weight="500", color=c("accent")), - rx.text("Change in Step 1", font_size="0.72rem", color=c("text_muted")), - spacing="1", - ), - columns="3", spacing="4", width="100%", - ), - spacing="0", - ) - ), + # Simple mode + rx.cond( + FinetuneState.ui_mode != "advanced", + _card( + rx.vstack( + rx.grid( + rx.vstack( + _label("Epochs"), + rx.input(value=FinetuneState.epochs.to_string(), + on_change=FinetuneState.set_epochs, + type="number", width="100%"), + rx.text("One full pass through your dataset", + font_size="0.72rem", color=c("text_muted")), + spacing="1", + ), + rx.vstack( + _label("Learning rate"), + rx.select.root( + rx.select.trigger(width="100%"), + rx.select.content( + *[rx.select.item(f"{lr} — {desc}", value=lr) + for lr, desc in _LR_PRESETS], + ), + value=FinetuneState.learning_rate, + on_change=FinetuneState.set_learning_rate, + ), + spacing="1", + ), + rx.vstack( + _label("Technique"), + rx.text(FinetuneState.technique_label, font_size="0.88rem", + font_weight="500", color=c("accent")), + rx.text("Change in Step 1", font_size="0.72rem", color=c("text_muted")), + spacing="1", + ), + columns="3", spacing="4", width="100%", + ), + spacing="0", + ) + ), + rx.fragment(), + ),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/components/finetune/step4_configure.py` around lines 35 - 72, The simple-mode _card should be rendered only when advanced mode is disabled; wrap the simple form block (the _card(...) that contains the Epochs/LR/Technique grid) in a conditional that checks the FinetuneState advanced flag (e.g., render only if not FinetuneState.advanced or FinetuneState.is_advanced === false). Update the JSX/reactive expression so the simple card is omitted when the advanced toggle is on, using the same FinetuneState property used by the advanced card/toggle to ensure it hides immediately when advanced mode is enabled.
🟡 Minor comments (1)
app/pages/datasets.py-67-83 (1)
67-83:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon't render the curated grid after an empty or failed search.
A zero-result search and any request error both leave
search_resultsempty, so the UI falls straight back to the starter cards. That makes it look like those curated datasets matched the query instead of showing “no results” or “search failed.”Also applies to: 344-365
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/pages/datasets.py` around lines 67 - 83, When a search completes (success, zero-results, or exception), mark that a search was performed and distinguish failures from no-results so the curated grid isn't shown; inside the async search block update self.search_performed = True in the success and except branches, set self.search_results to the returned list on 200, set self.search_results = [] when the API returns zero results, and on exception set self.search_error = True (and leave search_results as None or empty) while always clearing self.is_searching; update the UI to check self.search_performed / self.search_error (instead of only truthiness of self.search_results) before rendering the curated grid.
🧹 Nitpick comments (1)
app/pages/finetune.py (1)
28-149: ⚡ Quick winReuse the shared progress bar component instead of re-implementing it here.
app/components/finetune/progress_bar.pyalready defines the same_STEP_LABELS,_step_dot(), and_progress_bar()logic. Keeping a second copy in this page guarantees drift the next time the wizard steps or styles change.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/pages/finetune.py` around lines 28 - 149, This file duplicates the progress bar implementation (_STEP_LABELS, _step_dot, _progress_bar) which already lives in app/components/finetune/progress_bar.py; remove the local copies and import the shared components instead. Replace the local symbols _STEP_LABELS, _step_dot, and _progress_bar with imports from the shared module (e.g., from app.components.finetune.progress_bar import _STEP_LABELS, _step_dot, _progress_bar) and ensure any references to FinetuneState remain compatible; if the shared module exposes different names, adapt the import names or use aliases to match the existing usages in this file. Ensure no other local references rely on removed helpers before committing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b6b92205-a57a-4efe-8ffc-fbc488e83d5c
📒 Files selected for processing (34)
README.mdapp/api.pyapp/api/__init__.pyapp/api/datasets_routes.pyapp/api/deps.pyapp/api/experiments_routes.pyapp/api/jobs_routes.pyapp/api/models_routes.pyapp/api/schemas.pyapp/api/system.pyapp/components/finetune/progress_bar.pyapp/components/finetune/shared.pyapp/components/finetune/step1_model.pyapp/components/finetune/step2_intent.pyapp/components/finetune/step3_data.pyapp/components/finetune/step4_configure.pyapp/components/loss_chart.pyapp/pages/datasets.pyapp/pages/finetune.pyapp/pages/results.pyapp/pages/training.pyapp/state/experiment_state.pyapp/state/finetune_state.pyapp/state/job_state.pyapp/state/model_state.pyrxconfig.pytrainer/callbacks.pytrainer/config.pytrainer/dataset.pytrainer/finetune.pytrainer/loader.pytrainer/merge.pyworkers/merge_task.pyworkers/train_task.py
💤 Files with no reviewable changes (2)
- app/api.py
- app/state/model_state.py
| rx.slider(min=4, max=128, step=4, | ||
| default_value=[FinetuneState.lora_r], | ||
| on_value_commit=FinetuneState.set_lora_r), |
There was a problem hiding this comment.
Make the LoRA sliders controlled (use value instead of default_value).
In app/pages/finetune.py, the sliders at the cited lines use default_value=... (e.g., the one wiring on_value_commit=FinetuneState.set_lora_r), which leaves the thumb uncontrolled and only initializes from state once; it won’t reliably reflect updated FinetuneState after restore/back-navigation. Switch to value=[...] bound to the corresponding FinetuneState fields.
Also applies to the similar slider block around the other cited lines (730-732).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/pages/finetune.py` around lines 714 - 716, The LoRA sliders are using
default_value which only initializes the thumb and won’t reflect later state
changes; change the slider args to use value bound to the FinetuneState fields
(e.g., replace default_value=[FinetuneState.lora_r] with
value=[FinetuneState.lora_r]) for the slider wired to
on_value_commit=FinetuneState.set_lora_r, and do the same for the other similar
LoRA slider block that commits to a FinetuneState setter (bind value to the
corresponding FinetuneState.<field> property so the slider stays controlled and
updates on restore/navigation).
There was a problem hiding this comment.
Deferred to a follow-up issue: the slider fix requires verifying the Reflex slider API for the installed version (rx.slider value vs default_value semantics differ across Reflex releases). The change is low-risk but needs a UI smoke test before merging. Filed as a follow-up.
There was a problem hiding this comment.
@SahilKumar75 That's a sensible approach — Reflex's slider API has indeed shifted between minor versions, and a quick smoke test before committing to value=[...] is the right call. Glad it's tracked. Feel free to link the follow-up issue here when it's ready if you'd like me to keep an eye on it.
- datasets_routes: trust_remote_code=False for user-supplied dataset IDs - trainer/dataset: trust_remote_code=False + fail-fast on missing columns - jobs_routes: _resolve_job_dir() blocks path-traversal in all job_id joins - jobs_routes: per-job Lock + double-checked load for _INFER_CACHE - finetune_state: os.path.basename() on uploaded filenames (model + dataset) - finetune_state: extract ZIP uploads before setting local_model_path - merge_task: validate repo_url is https://github.com/, supply token via GIT_ASKPASS credential helper (never in URL/args), scrub token from errors - hf_spaces: add missing nginx.conf and entrypoint.sh (Docker build fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GitHub-hosted runners only have ~14 GB free. Installing the full ML stack (torch, transformers, bitsandbytes, trl, peft, accelerate) fills it up mid-build. - Add jlumbroso/free-disk-space to reclaim ~25 GB (Android SDK, .NET, Haskell, large packages, cached Docker images, swap) - Change load: true → load: false — CI only needs to verify the build succeeds, not load the image into the local daemon Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…actor (#44) Squash-merges feat/open-source-refactor. All CI checks green.
Summary
ExperimentStatepersists every run with full loss history, perplexity, and metadataapp/api.pymonolith intoapp/api/with 7 domain-scoped modulesrx.Basewas removed in current Reflex; replaced withpydantic.BaseModelrxconfig.pyapp/components/finetune/step1–4 extracted as reusable componentsCloses
Test plan
poetry run python -c "from app.app import app; print('OK')"— import cleanpoetry run reflex run— starts without traceback or SitemapPlugin warning🤖 Generated with Claude Code
Summary by CodeRabbit