Feat/tui interface #995

Luodian · 2026-01-15T06:50:50Z

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

A descriptive title: [xxx] XXXX
A detailed description

If you meet the lint warnings, you can use following scripts to reformat code.

pip install pre-commit
pre-commit install
pre-commit run --all-files

Ask for review

Once you feel comfortable for your PR, feel free to @ one of the contributors to review

General: @Luodian @kcz358 @pufanyi
Audio: @pbcong @ngquangtrung57

Thank you for your contributions!

- Add lmms_eval/tui/ module with Textual-based TUI app - Support model selection, task selection, and settings configuration - Add --tui flag to launch interactive mode - Add lmms-eval-tui entry point - Add textual as optional dependency [tui]

- Move decord/eva-decord to [video] optional dependency - eva-decord only for macOS Python < 3.12 (no wheels for 3.12+) - Base package now installable on all platforms

… preview.

Luodian · 2026-01-15T06:51:01Z

@claude

claude · 2026-01-15T06:51:16Z

Claude finished @Luodian's task in 2m 20s —— View job

Code Review Complete ✅

Reviewed PR #995: TUI Interface Implementation

This is an impressive feature addition that adds a Terminal User Interface (TUI) for interactive configuration of lmms-eval. Here's my comprehensive review:

🎯 Overview

Strengths:

Well-structured TUI implementation using Textual framework
Smart early detection of --tui flag to avoid heavy imports
Comprehensive configuration UI with model, task, and settings tabs
Good separation of concerns (app.py, discovery.py)

Critical Issues to Address:

1. Files in Wrong Location ⚠️ BLOCKER

Two files are placed in the repository root instead of the proper location:

/home/runner/work/lmms-eval/lmms-eval/core.py
/home/runner/work/lmms-eval/lmms-eval/search.py

Issue: These files appear to be for UI/UX BM25 search (based on content), but they're:

In the wrong location (repo root, not in a proper module)
Not related to lmms-eval's core functionality
May have been accidentally included from another project

Action Required:

If these files are needed for TUI, move them to lmms_eval/tui/
If they're not needed, remove them entirely
Update any imports accordingly

Fix this →

2. Missing Type Hints ❌ REQUIRED

Per CLAUDE.md line 27: "Type hints required for all code"

Files missing comprehensive type hints:

lmms_eval/tui/app.py - Many methods lack return type annotations
lmms_eval/tui/discovery.py - Some functions missing type hints

Examples from app.py:

# Line 63-69: Missing return type
def _strip_ansi(text: str):  # Should be: -> str
    global _ANSI_ESCAPE_RE
    ...

# Line 72-93: Missing return type  
def _highlight_bash_output(text: str):  # Should be: -> str
    ...

# Line 96-158: Missing return type
def _highlight_bash_syntax(line: str):  # Should be: -> str
    ...

Fix this →

3. Missing Docstrings ❌ REQUIRED

Per CLAUDE.md line 28: "Public APIs must have docstrings"

Missing docstrings in:

lmms_eval/tui/discovery.py:10 - discover_tasks() function
lmms_eval/tui/discovery.py:87 - discover_models() function
lmms_eval/tui/discovery.py:118 - get_popular_tasks() function
lmms_eval/tui/discovery.py:133 - get_popular_models() function
lmms_eval/tui/app.py - Most helper functions and classes

Fix this →

4. Code Complexity ⚠️

lmms_eval/tui/app.py is 1,528 lines - this violates the principle in CLAUDE.md line 29: "Functions must be focused and small"

Recommendation: Consider splitting app.py into:

app.py - Main app class and entry point
screens.py - Screen classes (WelcomeScreen, ConfigScreen)
widgets.py - Custom widget components
utils.py - Helper functions (_strip_ansi, _highlight_bash_syntax, etc.)

5. Missing Tests ⚠️

Per CLAUDE.md line 37: "New features require tests"

No tests found for the new TUI functionality. At minimum, add:

Unit tests for discovery functions
Integration tests for config building
Tests for command generation logic

6. Hardcoded Values ⚠️

lmms_eval/main.py:13

print("TUI mode requires 'textual' package. Install with: pip install lmms_eval[tui]")

Should use uv per CLAUDE.md line 19: "ONLY use uv, NEVER pip"

print("TUI mode requires 'textual' package. Install with: uv pip install -e .[tui]")

7. Dependency Management ✅ GOOD

pyproject.toml changes look good:

Proper use of optional dependencies [tui]
Added lmms-eval-tui entry point
Includes textual-image and textual-plotext

However, noticed in pyproject.toml:

[dependency-groups]
dev = [
    "textual>=7.2.0",  # Line 195: Version conflict?
    "textual-dev>=1.8.0",
]

vs

tui = [
    "textual>=0.47.0",  # Line 127: Different version
    ...
]

Question: Is textual>=7.2.0 in dev correct, or should it match the tui requirement?

📝 Detailed File Review

`lmms_eval/main.py` ✅ GOOD APPROACH

Lines 4-15: Smart early TUI detection before heavy imports - excellent for startup performance!

Lines 18-52: Clean quick-help display when no args provided

Issues:

Line 13: Use uv instead of pip in error message
Consider adding type hints to new functions

`lmms_eval/protocol.py`

Modified to add type definitions for chat content. Changes look reasonable but would benefit from review of:

Are all the new types (ChatTextContent, ChatImageContent, etc.) actually used by TUI?
If not, this might be scope creep for this PR

`lmms_eval/api/metrics.py`

Minimal changes, looks fine.

`lmms_eval/tui/app.py` - NEEDS WORK

Good aspects:

Clean separation of screens (WelcomeScreen, ConfigScreen)
Responsive design with dynamic tab loading
Good use of Textual's reactive system
Bash syntax highlighting is a nice touch

Issues:

Missing type hints throughout
No docstrings
File is too large (1,528 lines) - needs refactoring
Global variable usage (_ANSI_ESCAPE_RE, _CACHED_DEVICE)
Some functions could be simplified

Security note: Line 183-207 uses subprocess.run() with user input - appears safe (no shell=True), but verify all subprocess calls.

`lmms_eval/tui/discovery.py` ✅ MOSTLY GOOD

Clean implementation of task/model discovery with caching.

Issues:

Missing type hints on some functions
Missing docstrings
Exception handling is too broad (bare except: on lines 30, 68, 107)

Better exception handling:

except (IOError, yaml.YAMLError) as e:
    eval_logger.debug(f"Failed to parse {yaml_path}: {e}")

`core.py` and `search.py` ❌ WRONG LOCATION

These files implement a BM25 search engine for UI/UX style guides - they appear to be from a different project or accidentally included. They should not be in the repo root.

🔍 Potential Bugs

lmms_eval/tui/app.py:211 - _CACHED_DEVICE global could cause issues in multi-threaded scenarios
lmms_eval/tui/discovery.py:30 - Bare except could hide real errors
lmms_eval/main.py:112 - Using undefined logging module (should be eval_logger)

✅ What's Working Well

Feature completeness - Comprehensive TUI with all major config options
User experience - Auto-countdown, keyboard shortcuts, visual feedback
Performance - Smart lazy loading of tabs, early --tui detection
Dependencies - Proper optional dependency handling
Integration - Clean integration with existing CLI

📋 Checklist Before Merge

🎉 Summary

This is a valuable feature that will significantly improve the user experience for configuring lmms-eval. The core implementation is solid, but it needs refinement to meet the project's code quality standards before merging.

Priority fixes: File location, type hints, docstrings, and running formatters/linters.

Estimated effort: 2-4 hours to address all issues.

Great work on the TUI implementation! The user experience looks polished. Just needs some cleanup to match the codebase standards.

--- | Branch: feat/tui-interface

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61c3158abc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-15T06:56:36Z

core.py

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+UI/UX Pro Max Core - BM25 search engine for UI/UX style guides
+"""


Avoid shadowing the external core package

Adding a top-level core.py introduces a name collision with the core package that the PLM model imports (from core.args ... in lmms_eval/models/simple/plm.py). When running from the repo root (or any cwd on sys.path), Python will resolve core to this new module, which is not a package, and core.args will fail with ModuleNotFoundError: 'core' is not a package. That makes --model plm unusable. Renaming/moving this helper module (e.g., under lmms_eval/ or a less generic name) avoids breaking the existing model import path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-15T06:56:36Z

lmms_eval/tui/app.py

+                text=True,
+                bufsize=1,
+                env=env,
+                shell=True,
+                start_new_session=True,


Avoid default activation command that fails under /bin/sh

The TUI executes the composed command with shell=True (defaulting to /bin/sh on Linux), but the default activate_cmd is source .venv/bin/activate, which is a bashism and is not supported by /bin/sh on common distros (e.g., dash). That means the default “START” action fails immediately unless the user manually edits the activation command. Consider using a POSIX-compatible activation (. .venv/bin/activate) or running the shell with executable='/bin/bash' to make the default work.

Useful? React with 👍 / 👎.

kcz358 · 2026-01-15T07:24:06Z

I turn off my opencode's auto formatter lol since we have our own pre-commit link hook to do so.

- Remove OpenTUI terminal-based UI (had rendering issues) - Add React + Vite + Tailwind CSS web UI - FastAPI backend serves both API and static files - CLI starts server and opens browser automatically Features: - Model selection dropdown - Task list with search/filter and checkboxes - Real-time command preview - Live output streaming via SSE - Start/Stop evaluation controls - Settings: batch size, limit, device, verbosity, output path

- Add shell syntax highlighting for command preview and env vars editor - Add ANSI color code parsing for log output - Make Tasks and Environment Variables sections collapsible - Unify typography with monospace font and consistent sizing - Add search functionality to Select dropdown component - Remove broken INVERT button - Add group collapse/expand controls in task list - Make log output maximizable

Luodian · 2026-01-16T17:45:29Z

Web UI Feature Summary

Why This Feature?

Running LMMs-Eval evaluations currently requires constructing complex command-line arguments manually, which can be error-prone and difficult for users unfamiliar with all available options. The Web UI provides an intuitive interface that:

Lowers the barrier to entry - Users can explore available models and tasks visually without memorizing command syntax
Prevents configuration errors - Real-time command preview shows exactly what will be executed
Improves debugging experience - Live log streaming eliminates the need to monitor terminal output in a separate window
Enables quick iteration - Start/stop controls allow rapid testing of different configurations

Core Features

Model & Task Discovery: Browse 70+ registered models with searchable dropdown. Task list supports search/filter with hierarchical group display for easy benchmark discovery.

Real-time Command Preview: Configuration changes instantly update the generated command, making it easy to understand the CLI equivalent and copy for later use.

Live Log Streaming: Evaluation output streams in real-time via Server-Sent Events (SSE). No page refresh needed - watch progress as it happens.

Execution Control: One-click Start/Stop buttons with proper state management. Status indicator shows ready/running/error states.

How to Use

# Start the Web UI (opens browser automatically)
uv run lmms-eval-ui

# Custom port
LMMS_SERVER_PORT=3000 uv run lmms-eval-ui

The server starts on http://localhost:8000 by default and automatically opens your browser.

Architecture

Frontend: React + Vite + Tailwind CSS (pre-built, served as static files)
Backend: FastAPI with SSE for real-time log streaming
Auto-build: Frontend builds automatically on first run if Node.js 18+ is available

Screenshots

Main Interface - Configuration Panel

Model Selection Dropdown

Live Log Streaming

Tested Functionality

Model selection and search
Task selection with search/filter
Real-time command preview generation
Start evaluation with live log streaming
Stop button state management (enabled only during running)
Error state handling and display
Clear log functionality

Luodian added 12 commits January 12, 2026 15:37

[feat] added openrouter api model evaluation example script.

1cfb2c6

fix: handle --tui flag before heavy imports for instant startup

facec88

fix: show quick help on no args, only TUI on explicit --tui flag

0801369

fix: make decord optional to fix macOS Python 3.12+ installation

8108a5c

- Move decord/eva-decord to [video] optional dependency - eva-decord only for macOS Python < 3.12 (no wheels for 3.12+) - Base package now installable on all platforms

[feat] added real-time performance metrics and redesigned tui layout.

02f9d3d

[chore] refactored tui logo rendering using textual-image.

0eb6de2

[feat] refined tui terminology, styling, and added command highlighting.

ada1847

[feat] improved tui logo rendering with image support detection.

089707a

[feat] improved tui configuration ui and handling of dependencies

96115f9

[feat] added core bm25 search engine for ui/ux style guides.

76bb69c

[feat] added bash syntax highlighting and line numbers to tui command…

61c3158

… preview.

chatgpt-codex-connector bot reviewed Jan 15, 2026

View reviewed changes

Luodian force-pushed the feat/tui-interface branch from e4a0ccf to d4c448e Compare January 15, 2026 13:39

Luodian added 7 commits January 15, 2026 21:47

fix(tui): suppress server output to prevent pipe blocking

fe950fd

[refactor] lazy import tui functions and overhaul web ui.

1eef308

[feat] display git and system information in the tui web interface.

cac4d21

[feat] tui now supports environment variables.

5786aa6

fix: apply black and isort formatting

bffbfff

docs: add TUI screenshots

ba86eff

Luodian added 2 commits January 17, 2026 01:49

docs: add log streaming screenshot

86277f4

fix: make Web UI responsive to larger viewports

7313812

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/tui interface #995

Feat/tui interface #995

Uh oh!

Luodian commented Jan 15, 2026

Uh oh!

Luodian commented Jan 15, 2026

Uh oh!

claude bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 15, 2026

Uh oh!

chatgpt-codex-connector bot Jan 15, 2026

Uh oh!

kcz358 commented Jan 15, 2026

Uh oh!

Luodian commented Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat/tui interface #995

Are you sure you want to change the base?

Feat/tui interface #995

Uh oh!

Conversation

Luodian commented Jan 15, 2026

When you open a pull-request, please be sure to include the following

Ask for review

Uh oh!

Luodian commented Jan 15, 2026

Uh oh!

claude bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✅

🎯 Overview

1. Files in Wrong Location ⚠️ BLOCKER

2. Missing Type Hints ❌ REQUIRED

3. Missing Docstrings ❌ REQUIRED

4. Code Complexity ⚠️

5. Missing Tests ⚠️

6. Hardcoded Values ⚠️

7. Dependency Management ✅ GOOD

📝 Detailed File Review

lmms_eval/__main__.py ✅ GOOD APPROACH

lmms_eval/protocol.py

lmms_eval/api/metrics.py

lmms_eval/tui/app.py - NEEDS WORK

lmms_eval/tui/discovery.py ✅ MOSTLY GOOD

core.py and search.py ❌ WRONG LOCATION

🔍 Potential Bugs

✅ What's Working Well

📋 Checklist Before Merge

🎉 Summary

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 commented Jan 15, 2026

Uh oh!

Luodian commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Web UI Feature Summary

Why This Feature?

Core Features

How to Use

Architecture

Screenshots

Tested Functionality

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Jan 15, 2026 •

edited

Loading

`lmms_eval/main.py` ✅ GOOD APPROACH

`lmms_eval/protocol.py`

`lmms_eval/api/metrics.py`

`lmms_eval/tui/app.py` - NEEDS WORK

`lmms_eval/tui/discovery.py` ✅ MOSTLY GOOD

`core.py` and `search.py` ❌ WRONG LOCATION

Luodian commented Jan 16, 2026 •

edited

Loading