feat: improved screenshot accuracy by Arunmadhavan28 · Pull Request #11 · Cyborg-Network/Oxtools

Arunmadhavan28 · 2026-04-29T05:17:16Z

Oxtools Submission

Project name: Arunmadhavan
Contributor: Arunmadhavan28
Demo: [Link to Loom or YouTube recording — required]

What does this tool do?

screenshot to code improved accuracy

Submission checklist

Check every box before requesting a review. Unchecked items will result in the PR being sent back.

Structure

[x ] My project is in its own directory under projects/[my-project-name]/
[ x] I have not placed any files directly in the repository root

Required files

[ x] Dockerfile is present and docker build . succeeds
[ x] docker-compose.yml is present and docker compose up starts the app
[ x] oxlo-manifest.json is present and all fields are filled in
[ x] .env.example lists every environment variable the project needs (with empty values)
[ x] README.md is present with setup instructions a reviewer can follow exactly

Security

[ x] No API keys, private keys, or secrets are hardcoded anywhere in the codebase
[ x] My actual .env file is not included in this PR
[ x] I have verified my diff with git grep -i "api_key" and found no leaks

Oxlo API

[x ] The tool makes at least one functional call to the Oxlo API
[ x] The API key is read from the OXLO_API_KEY environment variable

For maintainers

docker build . succeeded locally
Security scan passed — no secrets in diff
docker compose up ran successfully and app is reachable
Oxlo API integration verified
Approved for merge

vercel · 2026-04-29T05:17:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
oxtools	Ready	Preview, Comment	May 20, 2026 1:39pm

oxlo-ai

OxBot Review

This PR significantly overhauls the screenshot-to-code tool with a multi-agent consensus pipeline (parallel syntax generation, frontier judge, and SSIM-based rendering validation) and adds 10-minute timeouts across the API route and frontend hook to support long-running swarms. The architecture is a meaningful upgrade, but a configuration mismatch in token limits and an overly restrictive judge output cap need correction before merge.

Notes

Multi-agent swarm with SSIM scoring is a strong architectural improvement for generation accuracy.
10-minute timeouts are consistently applied in both the Next.js API route and the frontend execution hook.
Several prompt comments reference different models than the assigned configuration variables (e.g., DeepSeek-R1 vs Kimi-K2.6); keeping these in sync will reduce maintenance confusion.
The Dockerfile's system Chromium installation is a pragmatic way to satisfy Playwright shared library dependencies on slim images.
Moving request.text() outside the try block in the API route removes structured 503 error handling for body parsing failures.

Verdict: Needs Changes | 3 inline comment(s)

_{Automated review by OxBot}

oxlo-ai

OxBot Review

This PR refactors the screenshot-to-code tool with a multi-agent consensus pipeline, judge model, and SSIM-based accuracy scoring, but it also modifies core platform infrastructure—including shared API routes, client hooks, and auth modules—rather than remaining self-contained under projects/. While the tool-level architecture is creative and the Dockerfile changes are pragmatic, the PR introduces significant cross-cutting concerns: blanket 10-minute timeouts risk resource exhaustion for all tools, the disabled browser sandbox creates a security vulnerability, and dead code mixed with fragile error handling undermines maintainability. The changeset needs structural realignment and security fixes before it can be merged safely.

Notes

The PR is structurally misaligned: it is framed as a standalone community project submission but modifies core platform infrastructure (app/src/app/api/tools/[toolId]/route.ts, app/src/hooks/use-tool-execution.ts, app/src/lib/auth.ts) rather than remaining isolated under projects/.
Applying a blanket, non-configurable 10-minute timeout across the unified API route and shared client hook risks resource exhaustion and proxy timeouts for all tools; long-running operations should use tool-specific timeouts, polling, or a Tier 2 async service architecture.
The screenshot-to-code tool's multi-agent consensus architecture is promising but undermined by mixing sync and async execution patterns, dead code, stale docstrings, and a disabled browser sandbox that introduces security risk.
The presence of an accidental whitespace change in app/src/lib/auth.ts suggests the PR was not carefully diff-reviewed before submission, raising confidence concerns for the broader changeset.

⚠️ Verdict: Needs Changes | 11 inline comment(s) | 6 file(s) reviewed | ⏱️ 277s

_{Automated review by OxBot}

ms-shashank · 2026-04-29T10:43:21Z

Hey @Arunmadhavan28,

Merge Conflicts

This PR has merge conflicts in app/src/app/api/tools/[toolId]/route.ts - please resolve them first. Once resolved, our CI (lint + build checks) will automatically run.

Code Structure

The concern is that tool.py is a 700+ line monolithic file with everything packed into a single file API client logic, multi-agent orchestration, prompt definitions, SSIM scoring, Playwright automation, image processing, and error handling all mixed together.
Please refactor this into a modular structure. For example:
services/python-tools/tools/screenshot-to-code/
├── tool.py # Main entry point + FastAPI endpoint (thin)
├── agents.py # Multi-agent orchestration (swarm, consensus)
├── prompts.py # System prompts and prompt templates
├── browser.py # Playwright screenshot logic
├── scoring.py # SSIM scoring and validation
├── config.py # Constants, model configs, timeouts
└── requirements.txt

This makes the code easier to review, test, and maintain and it's how our other 2 tools are structured.

oxlo-ai

OxBot Review

This PR claims to be an isolated community screenshot-to-code tool submission, but instead extensively modifies core platform infrastructure—including authentication, shared frontend hooks, generic API routes, and the shared Python services base image—while also introducing a massive 1,185-line monolithic pipeline. It adds a multi-agent consensus swarm with SSIM-based fidelity scoring, visual comparison, and an auto-healing loop, but the implementation suffers from severe architectural bloat, security regressions in iframe sandboxing, and a critical unauthorized change to free-tier billing limits. The changes pollute shared abstractions with tool-specific logic, introduce dead code, and create dangerous coupling between generated HTML and the parent application.

Notes

The PR violates the community project isolation contract by touching core platform code (auth, shared hooks, base Docker images, generic API routes) instead of remaining self-contained under projects/.
There is a fundamental mismatch between the PR description (screenshot accuracy improvements) and the actual changes, which include generic infrastructure modifications, billing limit changes, and UI framework alterations.
The submission introduces severe architectural bloat across the stack, with multiple files exceeding 300–400 lines and mixing concerns (API orchestration, UI rendering, computer vision, prompts, and business logic in single modules).
Tool-specific requirements are being forced into shared abstractions, including heavy system dependencies in the shared Tier 2 Dockerfile and a blanket 10-minute timeout in the universal tool-execution hook.
Security and safety regressions span multiple layers, from weakened iframe sandboxing policies for user-generated HTML to unguarded SSR object access and hardcoded input field coupling in generic pages.
The presence of dead code (unused spatial extractor), contradictory timeout configurations (600s vs. 1 hour), and a critical unauthorized billing change suggest inadequate review and integration testing before submission.

🚨 Verdict: Critical Issues Found | 23 inline comment(s) | 9 file(s) reviewed | ⏱️ 334s

_{Automated review by OxBot}

oxlo-ai

OxBot Review

This PR attempts to improve screenshot-to-code accuracy through a multi-agent pipeline with OCR, visual diffing, SSIM scoring, and an iterative healing loop. While the feature ambition is commendable, the submission is structurally misaligned: it is framed as a self-contained community project but instead modifies core platform files across the frontend, API, auth, and shared infrastructure. The changes introduce a critical billing/auth regression, severe iframe sandbox security vulnerabilities, and unsustainable architectural bloat in several files exceeding 600–1200 lines. Overall, the PR requires significant restructuring, scope reduction, and security hardening before it can be considered for merge.

Notes

The submission claims to be a community project but violates repository structure by modifying core monorepo files (auth, shared API routes, frontend pages, and base Docker images) instead of remaining isolated under projects/.
A critical, out-of-scope change to core auth logic increases the free-tier limit 100x (5 to 500), which would severely impact platform usage controls and must be removed immediately.
Security regressions in the ResultViewer component allow AI-generated HTML same-origin access via allow-same-origin iframe sandboxing, enabling XSS against parent-document cookies, localStorage, and DOM.
Multiple files far exceed maintainability limits (tool.py >1,200 lines, ResultViewer >600 lines) with tight coupling, dead code, inline render functions, and hardcoded API endpoints that should be decoupled into focused modules.
Heavy browser and OCR dependencies are being forced onto the shared Tier 2 base image rather than containerized within the project directory, bloating infrastructure for all tools.
Operational risks abound: proxy timeouts extended to one hour, client timeouts to ten minutes, and multi-agent pipelines that can exceed thirty minutes without request-level cancellation or orphan-process cleanup.

🚨 Verdict: Critical Issues Found | 24 inline comment(s) | 9 file(s) reviewed | ⏱️ 353s

_{Automated review by OxBot}

oxlo-ai

OxBot Review

This PR aims to improve screenshot-to-code accuracy by introducing a multi-agent pipeline with OCR anchoring, SSIM scoring, and visual editing capabilities. However, it fundamentally misaligns with the repository structure by presenting itself as a standalone community project while extensively modifying core application infrastructure—including shared API routes, authentication, types, and the base Docker image. The new Python tool has grown into a 1,179-line monolith with critical security vulnerabilities in postMessage handlers, severe performance and cost overhead from parallel LLM calls, and a 600-second timeout that risks gateway failures. Additionally, critical syntax errors, removal of documentation, and hardcoded infrastructure timeouts introduce immediate stability risks across the entire platform. The changes require significant architectural refactoring, security hardening, and path realignment before they can be considered for merge.

Notes

Mismatch between submission type and scope: The PR claims to be a community project submission but modifies core framework files (app/src/*, services/python-tools/Dockerfile) instead of being isolated under projects/. This violates the stated submission requirements and risks destabilizing the entire platform.
Monolithic architecture: Both the Python tool (~1,179 lines) and the ResultViewer component have become unmaintainable monoliths mixing image processing, LLM orchestration, browser automation, UI state, and postMessage protocols. They need modular decomposition into focused modules and sub-components.
Security posture: The PR introduces client-side script injection via postMessage without strict origin validation, creating XSS and data exfiltration risks in the screenshot-to-code output. This must be hardened before any code handling user-generated HTML reaches production.
Performance and cost concerns: The multi-agent pipeline (parallel coders + judge + healing loops + high-detail image uploads) is extremely expensive and slow. Combined with 600s timeouts in shared infrastructure, this will likely cause worker starvation, gateway timeouts, and excessive LLM spend.
Infrastructure contamination: Changes to shared resources—the unified API route timeout, base Docker image bloat, core auth/types/hooks—impact every tool in the ecosystem. These cross-cutting modifications should be decoupled from the tool submission and evaluated separately.
Code quality and maintainability: The PR leaves issue-tracker comments in source code, removes existing JSDoc documentation, introduces invalid JSON syntax, and employs conflicting React patterns. A cleanup pass is needed to meet production standards.

🚨 Verdict: Critical Issues Found | 24 inline comment(s) | 11 file(s) reviewed | ⏱️ 366s

_{Automated review by OxBot}

oxlo-ai · 2026-05-04T06:54:52Z

OxBot Review

⚠️ Review encountered an error. The team has been notified.

Error: GitHub API POST /repos/Cyborg-Network/Oxtools/pulls/11/reviews: 422 {"message":"Unprocessable Entity","errors":["Line could not be resolved"],"documentation_url":"https://docs.github.com/rest/pulls/re

_{Automated review by OxBot}

oxlo-ai

OxBot Review

This PR attempts to improve screenshot-to-code accuracy by adding OCR and computer vision dependencies, a multi-agent Python pipeline with SSIM-based healing loops, and an enhanced interactive ResultViewer. However, it is submitted as a standalone community project yet extensively modifies core platform infrastructure—including shared API routes, authentication, the generic ResultViewer, and the shared Tier 2 Docker base—violating self-containment requirements. The implementation introduces critical security vulnerabilities via iframe sandbox misconfiguration and wildcard postMessage origins, while the 1179-line Python tool and bloated shared component suffer from poor separation of concerns, dead code, and severe performance anti-patterns. Overall, the changes are ambitious but require significant architectural refactoring, security hardening, and scope correction before they can be merged.

Notes

PR scope confusion: Submitted as a community project under projects/ but modifies core platform infrastructure (shared API routes, auth, base Docker image, generic UI components). Community tools must remain self-contained; platform changes require separate justification and review.
Cross-cutting security risks: The combination of sandboxed iframe misconfiguration with srcdoc inheritance and wildcard postMessage fallbacks creates exploitable XSS/clickjacking vectors across the tool execution surface.
Monolithic architecture: Both the Python backend (~1179 lines) and React frontend viewer have grown into monolithic modules mixing image processing, browser automation, API orchestration, and UI state, violating separation of concerns and maintainability standards.
Shared infrastructure contamination: Heavy tool-specific dependencies (Tesseract, OpenCV, Playwright) and global timeout increases are being forced into shared Tier 2 infrastructure used by all tools, impacting build times, cache efficiency, and resource limits platform-wide.
Dead code and unverified claims: The PR title claims accuracy improvements, yet the diff contains unused pipeline steps (spatial extractor) and primarily increases timeouts without documenting how duration relates to accuracy, suggesting incomplete implementation.
Performance anti-patterns across stack: Fresh browser launches per rendering pass and React key-driven iframe remounts indicate a need for lifecycle reuse strategies that span both the Python service and the frontend viewer.

🚨 Verdict: Critical Issues Found | 19 inline comment(s) | 10 file(s) reviewed | ⏱️ 723s

_{Automated review by OxBot}

oxlo-ai

OxBot Review

This PR attempts to improve screenshot-to-code accuracy by adding a multi-agent consensus pipeline, OCR/image processing, visual editing, and configurable timeouts. However, it violates repository boundaries by modifying core shared infrastructure—including app/src/ framework code, the unified API route, generic UI components, and the shared Python runner Dockerfile—rather than submitting a self-contained community project under projects/. The changes introduce severe architectural bloat with the Python tool ballooning to ~1150 lines and the generic ResultViewer to ~800 lines, while also adopting risky patterns like 10-minute synchronous HTTP timeouts, heavy dependencies forced onto the shared Docker base image, and tool-specific API calls embedded in generic components.

Notes

Boundary violation: Described as a community project submission, yet changes are scattered across core app/src/ and shared services/python-tools/ instead of being isolated under projects/[name]/, risking platform-wide destabilization.
Architectural bloat: Both the Python tool (~1150 lines) and the generic ResultViewer (~800 lines) far exceed maintainable thresholds, violating separation of concerns by packing orchestration, rendering, UI scripting, and image processing into single modules.
Infrastructure contamination: The shared Python Dockerfile is permanently bloated with Tesseract, OpenCV, and Chromium for a single tool, significantly increasing image size and attack surface for all Python tools; prefer tool-specific images or multi-stage builds.
Synchronous timeout anti-pattern: The stack introduces 10-minute synchronous HTTP timeouts (600,000ms) across the tool config, API route, and hook, which likely exceed serverless limits and create poor UX; an async job-polling or webhook model would be more robust.
Generic component coupling: The supposedly reusable ResultViewer and shared tool execution page are hardcoded with screenshot-to-code-specific logic (direct API fetches, image comparison, edit panels), undermining their reusability across the broader tool ecosystem.
Documentation regressions: Several core files (API route, auth module) have had extensive architectural JSDoc stripped without replacement, harming contributor onboarding and long-term maintainability.

⚠️ Verdict: Needs Changes | 21 inline comment(s) | 10 file(s) reviewed | ⏱️ 554s

_{Automated review by OxBot}

oxlo-ai

OxBot Review

This PR aims to improve screenshot-to-code accuracy by introducing a multi-agent pipeline with SSIM scoring, OCR, iterative refinement, and enhanced UI components including side-by-side comparison and visual editing. While the ambition and some individual improvements—such as configurable timeouts and dynamic image input detection—are valuable, the submission suffers from a fundamental structural mismatch: it is presented as an isolated community project but extensively modifies core application files and shared infrastructure rather than residing under projects/. The implementation is highly monolithic, with the Python tool ballooning to over 1,150 lines and React components growing unwieldy, while architectural choices like 10-minute synchronous timeouts and per-render Playwright launches introduce significant performance, cost, and reliability risks.

Notes

The submission is described as a standalone community project but modifies core application configuration, shared API routes, and the global Python services base image rather than isolating changes under projects/[project-name]/. This forces tool-specific dependencies (Tesseract, Playwright) onto every Tier 2 tool and breaks the repository's modular boundaries.
Both the frontend and backend implementations suffer from excessive module size and tight coupling. The Python tool, React result viewer, and tool page each exceed maintainability thresholds and should be split into focused sub-modules (e.g., rendering, scoring, prompts, UI panels).
The combination of 600-second timeouts, multiple retries, and per-render browser launches creates a blocking architecture that is incompatible with standard HTTP infrastructure. The system should move toward asynchronous job processing with status polling or SSE for multi-minute workloads.
The judge-model step processes full HTML outputs and high-resolution images for ranking, which is prohibitively expensive and likely to hit token limits. A lighter-weight evaluation strategy—such as structural diffing or truncated previews—is needed for viability.
Whitespace-only changes across core configuration files introduce unnecessary diff noise and risk conflicting with the repository's existing Biome formatting rules. These should be reverted.
Although types and hooks are updated to support per-tool timeouts, the unified API route implements a timeout equal to the platform maxDuration with no headroom, preventing graceful error serialization and risking opaque platform-level 504 responses.

⚠️ Verdict: Needs Changes | 19 inline comment(s) | 15 file(s) reviewed | ⏱️ 569s

_{Automated review by OxBot}

Arunmadhavan28 · 2026-05-12T06:02:31Z

Arunmadhavan28 · 2026-05-12T06:03:57Z

ms-shashank

Everything is good to merge, but can you clean up the extension files from this branch so that we don't mix up in this repo.

ms-shashank

The Extension directory was completely removed right!. because i can still see there is a extension directory in the last commit, only the inside files were cleared i believe. So can you remove the complete extension directory itself.

ms-shashank

LGTM!

feat: improved screenshot accuracy

0e4ace5

Arunmadhavan28 requested a review from ms-shashank April 29, 2026 05:17

oxlo-ai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread services/python-tools/tools/screenshot-to-code/tool.py Outdated

Comment thread services/python-tools/tools/screenshot-to-code/tool.py Outdated

Comment thread app/src/app/api/tools/[toolId]/route.ts Outdated

Cyborg-Network deleted a comment from oxlo-ai Bot Apr 29, 2026

oxlo-ai Bot reviewed Apr 29, 2026

View reviewed changes

feat: reduced latency|multiple pics|live preview

754330c

Arunmadhavan28 requested a review from beekay2706 as a code owner May 4, 2026 05:02

vercel Bot deployed to Preview May 4, 2026 05:02 View deployment