Skip to content

Conversation

@Luodian
Copy link
Contributor

@Luodian Luodian commented Jan 15, 2026

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description

If you meet the lint warnings, you can use following scripts to reformat code.

pip install pre-commit
pre-commit install
pre-commit run --all-files

Ask for review

Once you feel comfortable for your PR, feel free to @ one of the contributors to review

General: @Luodian @kcz358 @pufanyi
Audio: @pbcong @ngquangtrung57

Thank you for your contributions!

@Luodian
Copy link
Contributor Author

Luodian commented Jan 15, 2026

@claude

@claude
Copy link

claude bot commented Jan 15, 2026

Claude finished @Luodian's task in 2m 20s —— View job


Code Review Complete ✅

Reviewed PR #995: TUI Interface Implementation

This is an impressive feature addition that adds a Terminal User Interface (TUI) for interactive configuration of lmms-eval. Here's my comprehensive review:


🎯 Overview

Strengths:

  • Well-structured TUI implementation using Textual framework
  • Smart early detection of --tui flag to avoid heavy imports
  • Comprehensive configuration UI with model, task, and settings tabs
  • Good separation of concerns (app.py, discovery.py)

Critical Issues to Address:

1. Files in Wrong Location ⚠️ BLOCKER

Two files are placed in the repository root instead of the proper location:

  • /home/runner/work/lmms-eval/lmms-eval/core.py
  • /home/runner/work/lmms-eval/lmms-eval/search.py

Issue: These files appear to be for UI/UX BM25 search (based on content), but they're:

  1. In the wrong location (repo root, not in a proper module)
  2. Not related to lmms-eval's core functionality
  3. May have been accidentally included from another project

Action Required:

  • If these files are needed for TUI, move them to lmms_eval/tui/
  • If they're not needed, remove them entirely
  • Update any imports accordingly

Fix this →

2. Missing Type Hints ❌ REQUIRED

Per CLAUDE.md line 27: "Type hints required for all code"

Files missing comprehensive type hints:

  • lmms_eval/tui/app.py - Many methods lack return type annotations
  • lmms_eval/tui/discovery.py - Some functions missing type hints

Examples from app.py:

# Line 63-69: Missing return type
def _strip_ansi(text: str):  # Should be: -> str
    global _ANSI_ESCAPE_RE
    ...

# Line 72-93: Missing return type  
def _highlight_bash_output(text: str):  # Should be: -> str
    ...

# Line 96-158: Missing return type
def _highlight_bash_syntax(line: str):  # Should be: -> str
    ...

Fix this →

3. Missing Docstrings ❌ REQUIRED

Per CLAUDE.md line 28: "Public APIs must have docstrings"

Missing docstrings in:

  • lmms_eval/tui/discovery.py:10 - discover_tasks() function
  • lmms_eval/tui/discovery.py:87 - discover_models() function
  • lmms_eval/tui/discovery.py:118 - get_popular_tasks() function
  • lmms_eval/tui/discovery.py:133 - get_popular_models() function
  • lmms_eval/tui/app.py - Most helper functions and classes

Fix this →

4. Code Complexity ⚠️

lmms_eval/tui/app.py is 1,528 lines - this violates the principle in CLAUDE.md line 29: "Functions must be focused and small"

Recommendation: Consider splitting app.py into:

  • app.py - Main app class and entry point
  • screens.py - Screen classes (WelcomeScreen, ConfigScreen)
  • widgets.py - Custom widget components
  • utils.py - Helper functions (_strip_ansi, _highlight_bash_syntax, etc.)

5. Missing Tests ⚠️

Per CLAUDE.md line 37: "New features require tests"

No tests found for the new TUI functionality. At minimum, add:

  • Unit tests for discovery functions
  • Integration tests for config building
  • Tests for command generation logic

6. Hardcoded Values ⚠️

lmms_eval/main.py:13

print("TUI mode requires 'textual' package. Install with: pip install lmms_eval[tui]")

Should use uv per CLAUDE.md line 19: "ONLY use uv, NEVER pip"

print("TUI mode requires 'textual' package. Install with: uv pip install -e .[tui]")

7. Dependency Management ✅ GOOD

pyproject.toml changes look good:

  • Proper use of optional dependencies [tui]
  • Added lmms-eval-tui entry point
  • Includes textual-image and textual-plotext

However, noticed in pyproject.toml:

[dependency-groups]
dev = [
    "textual>=7.2.0",  # Line 195: Version conflict?
    "textual-dev>=1.8.0",
]

vs

tui = [
    "textual>=0.47.0",  # Line 127: Different version
    ...
]

Question: Is textual>=7.2.0 in dev correct, or should it match the tui requirement?


📝 Detailed File Review

lmms_eval/__main__.py ✅ GOOD APPROACH

Lines 4-15: Smart early TUI detection before heavy imports - excellent for startup performance!

Lines 18-52: Clean quick-help display when no args provided

Issues:

  • Line 13: Use uv instead of pip in error message
  • Consider adding type hints to new functions

lmms_eval/protocol.py

Modified to add type definitions for chat content. Changes look reasonable but would benefit from review of:

  • Are all the new types (ChatTextContent, ChatImageContent, etc.) actually used by TUI?
  • If not, this might be scope creep for this PR

lmms_eval/api/metrics.py

Minimal changes, looks fine.

lmms_eval/tui/app.py - NEEDS WORK

Good aspects:

  • Clean separation of screens (WelcomeScreen, ConfigScreen)
  • Responsive design with dynamic tab loading
  • Good use of Textual's reactive system
  • Bash syntax highlighting is a nice touch

Issues:

  • Missing type hints throughout
  • No docstrings
  • File is too large (1,528 lines) - needs refactoring
  • Global variable usage (_ANSI_ESCAPE_RE, _CACHED_DEVICE)
  • Some functions could be simplified

Security note: Line 183-207 uses subprocess.run() with user input - appears safe (no shell=True), but verify all subprocess calls.

lmms_eval/tui/discovery.py ✅ MOSTLY GOOD

Clean implementation of task/model discovery with caching.

Issues:

  • Missing type hints on some functions
  • Missing docstrings
  • Exception handling is too broad (bare except: on lines 30, 68, 107)

Better exception handling:

except (IOError, yaml.YAMLError) as e:
    eval_logger.debug(f"Failed to parse {yaml_path}: {e}")

core.py and search.py ❌ WRONG LOCATION

These files implement a BM25 search engine for UI/UX style guides - they appear to be from a different project or accidentally included. They should not be in the repo root.


🔍 Potential Bugs

  1. lmms_eval/tui/app.py:211 - _CACHED_DEVICE global could cause issues in multi-threaded scenarios
  2. lmms_eval/tui/discovery.py:30 - Bare except could hide real errors
  3. lmms_eval/main.py:112 - Using undefined logging module (should be eval_logger)

✅ What's Working Well

  1. Feature completeness - Comprehensive TUI with all major config options
  2. User experience - Auto-countdown, keyboard shortcuts, visual feedback
  3. Performance - Smart lazy loading of tabs, early --tui detection
  4. Dependencies - Proper optional dependency handling
  5. Integration - Clean integration with existing CLI

📋 Checklist Before Merge

  • Remove or relocate core.py and search.py from repo root
  • Add type hints to all functions
  • Add docstrings to public APIs
  • Run uv run ruff format .
  • Run uv run ruff check . --fix
  • Run uv run pyright and fix type errors
  • Consider refactoring app.py into smaller modules
  • Add unit tests for discovery logic
  • Fix pip reference to use uv
  • Fix bare except clauses
  • Fix undefined logging reference
  • Verify textual version consistency in pyproject.toml

🎉 Summary

This is a valuable feature that will significantly improve the user experience for configuring lmms-eval. The core implementation is solid, but it needs refinement to meet the project's code quality standards before merging.

Priority fixes: File location, type hints, docstrings, and running formatters/linters.

Estimated effort: 2-4 hours to address all issues.

Great work on the TUI implementation! The user experience looks polished. Just needs some cleanup to match the codebase standards.

--- | Branch: feat/tui-interface

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61c3158abc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1 to +5
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
UI/UX Pro Max Core - BM25 search engine for UI/UX style guides
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid shadowing the external core package

Adding a top-level core.py introduces a name collision with the core package that the PLM model imports (from core.args ... in lmms_eval/models/simple/plm.py). When running from the repo root (or any cwd on sys.path), Python will resolve core to this new module, which is not a package, and core.args will fail with ModuleNotFoundError: 'core' is not a package. That makes --model plm unusable. Renaming/moving this helper module (e.g., under lmms_eval/ or a less generic name) avoids breaking the existing model import path.

Useful? React with 👍 / 👎.

Comment on lines 1030 to 1034
text=True,
bufsize=1,
env=env,
shell=True,
start_new_session=True,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid default activation command that fails under /bin/sh

The TUI executes the composed command with shell=True (defaulting to /bin/sh on Linux), but the default activate_cmd is source .venv/bin/activate, which is a bashism and is not supported by /bin/sh on common distros (e.g., dash). That means the default “START” action fails immediately unless the user manually edits the activation command. Consider using a POSIX-compatible activation (. .venv/bin/activate) or running the shell with executable='/bin/bash' to make the default work.

Useful? React with 👍 / 👎.

@kcz358
Copy link
Collaborator

kcz358 commented Jan 15, 2026

I turn off my opencode's auto formatter lol since we have our own pre-commit link hook to do so.

- Remove OpenTUI terminal-based UI (had rendering issues)
- Add React + Vite + Tailwind CSS web UI
- FastAPI backend serves both API and static files
- CLI starts server and opens browser automatically

Features:
- Model selection dropdown
- Task list with search/filter and checkboxes
- Real-time command preview
- Live output streaming via SSE
- Start/Stop evaluation controls
- Settings: batch size, limit, device, verbosity, output path
@Luodian Luodian force-pushed the feat/tui-interface branch from e4a0ccf to d4c448e Compare January 15, 2026 13:39
- Add shell syntax highlighting for command preview and env vars editor
- Add ANSI color code parsing for log output
- Make Tasks and Environment Variables sections collapsible
- Unify typography with monospace font and consistent sizing
- Add search functionality to Select dropdown component
- Remove broken INVERT button
- Add group collapse/expand controls in task list
- Make log output maximizable
@Luodian
Copy link
Contributor Author

Luodian commented Jan 16, 2026

Web UI Feature Summary

Why This Feature?

Running LMMs-Eval evaluations currently requires constructing complex command-line arguments manually, which can be error-prone and difficult for users unfamiliar with all available options. The Web UI provides an intuitive interface that:

  • Lowers the barrier to entry - Users can explore available models and tasks visually without memorizing command syntax
  • Prevents configuration errors - Real-time command preview shows exactly what will be executed
  • Improves debugging experience - Live log streaming eliminates the need to monitor terminal output in a separate window
  • Enables quick iteration - Start/stop controls allow rapid testing of different configurations

Core Features

Model & Task Discovery: Browse 70+ registered models with searchable dropdown. Task list supports search/filter with hierarchical group display for easy benchmark discovery.

Real-time Command Preview: Configuration changes instantly update the generated command, making it easy to understand the CLI equivalent and copy for later use.

Live Log Streaming: Evaluation output streams in real-time via Server-Sent Events (SSE). No page refresh needed - watch progress as it happens.

Execution Control: One-click Start/Stop buttons with proper state management. Status indicator shows ready/running/error states.

How to Use

# Start the Web UI (opens browser automatically)
uv run lmms-eval-ui

# Custom port
LMMS_SERVER_PORT=3000 uv run lmms-eval-ui

The server starts on http://localhost:8000 by default and automatically opens your browser.

Architecture

  • Frontend: React + Vite + Tailwind CSS (pre-built, served as static files)
  • Backend: FastAPI with SSE for real-time log streaming
  • Auto-build: Frontend builds automatically on first run if Node.js 18+ is available

Screenshots

Main Interface - Configuration Panel
TUI Main

Model Selection Dropdown
Model Selection

Live Log Streaming
Log Streaming


Tested Functionality

  • Model selection and search
  • Task selection with search/filter
  • Real-time command preview generation
  • Start evaluation with live log streaming
  • Stop button state management (enabled only during running)
  • Error state handling and display
  • Clear log functionality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants