feat: improved screenshot accuracy#11
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
OxBot Review
This PR significantly overhauls the screenshot-to-code tool with a multi-agent consensus pipeline (parallel syntax generation, frontier judge, and SSIM-based rendering validation) and adds 10-minute timeouts across the API route and frontend hook to support long-running swarms. The architecture is a meaningful upgrade, but a configuration mismatch in token limits and an overly restrictive judge output cap need correction before merge.
Notes
- Multi-agent swarm with SSIM scoring is a strong architectural improvement for generation accuracy.
- 10-minute timeouts are consistently applied in both the Next.js API route and the frontend execution hook.
- Several prompt comments reference different models than the assigned configuration variables (e.g., DeepSeek-R1 vs Kimi-K2.6); keeping these in sync will reduce maintenance confusion.
- The Dockerfile's system Chromium installation is a pragmatic way to satisfy Playwright shared library dependencies on slim images.
- Moving request.text() outside the try block in the API route removes structured 503 error handling for body parsing failures.
Verdict: Needs Changes | 3 inline comment(s)
Automated review by OxBot
There was a problem hiding this comment.
OxBot Review
This PR refactors the screenshot-to-code tool with a multi-agent consensus pipeline, judge model, and SSIM-based accuracy scoring, but it also modifies core platform infrastructure—including shared API routes, client hooks, and auth modules—rather than remaining self-contained under projects/. While the tool-level architecture is creative and the Dockerfile changes are pragmatic, the PR introduces significant cross-cutting concerns: blanket 10-minute timeouts risk resource exhaustion for all tools, the disabled browser sandbox creates a security vulnerability, and dead code mixed with fragile error handling undermines maintainability. The changeset needs structural realignment and security fixes before it can be merged safely.
Notes
- The PR is structurally misaligned: it is framed as a standalone community project submission but modifies core platform infrastructure (
app/src/app/api/tools/[toolId]/route.ts,app/src/hooks/use-tool-execution.ts,app/src/lib/auth.ts) rather than remaining isolated underprojects/. - Applying a blanket, non-configurable 10-minute timeout across the unified API route and shared client hook risks resource exhaustion and proxy timeouts for all tools; long-running operations should use tool-specific timeouts, polling, or a Tier 2 async service architecture.
- The screenshot-to-code tool's multi-agent consensus architecture is promising but undermined by mixing sync and async execution patterns, dead code, stale docstrings, and a disabled browser sandbox that introduces security risk.
- The presence of an accidental whitespace change in
app/src/lib/auth.tssuggests the PR was not carefully diff-reviewed before submission, raising confidence concerns for the broader changeset.
Automated review by OxBot
|
Hey @Arunmadhavan28, Merge Conflicts
Code StructureThe concern is that This makes the code easier to review, test, and maintain and it's how our other 2 tools are structured. |
There was a problem hiding this comment.
OxBot Review
This PR claims to be an isolated community screenshot-to-code tool submission, but instead extensively modifies core platform infrastructure—including authentication, shared frontend hooks, generic API routes, and the shared Python services base image—while also introducing a massive 1,185-line monolithic pipeline. It adds a multi-agent consensus swarm with SSIM-based fidelity scoring, visual comparison, and an auto-healing loop, but the implementation suffers from severe architectural bloat, security regressions in iframe sandboxing, and a critical unauthorized change to free-tier billing limits. The changes pollute shared abstractions with tool-specific logic, introduce dead code, and create dangerous coupling between generated HTML and the parent application.
Notes
- The PR violates the community project isolation contract by touching core platform code (auth, shared hooks, base Docker images, generic API routes) instead of remaining self-contained under
projects/. - There is a fundamental mismatch between the PR description (screenshot accuracy improvements) and the actual changes, which include generic infrastructure modifications, billing limit changes, and UI framework alterations.
- The submission introduces severe architectural bloat across the stack, with multiple files exceeding 300–400 lines and mixing concerns (API orchestration, UI rendering, computer vision, prompts, and business logic in single modules).
- Tool-specific requirements are being forced into shared abstractions, including heavy system dependencies in the shared Tier 2 Dockerfile and a blanket 10-minute timeout in the universal tool-execution hook.
- Security and safety regressions span multiple layers, from weakened iframe sandboxing policies for user-generated HTML to unguarded SSR object access and hardcoded input field coupling in generic pages.
- The presence of dead code (unused spatial extractor), contradictory timeout configurations (600s vs. 1 hour), and a critical unauthorized billing change suggest inadequate review and integration testing before submission.
🚨 Verdict: Critical Issues Found | 23 inline comment(s) | 9 file(s) reviewed | ⏱️ 334s
Automated review by OxBot
There was a problem hiding this comment.
OxBot Review
This PR attempts to improve screenshot-to-code accuracy through a multi-agent pipeline with OCR, visual diffing, SSIM scoring, and an iterative healing loop. While the feature ambition is commendable, the submission is structurally misaligned: it is framed as a self-contained community project but instead modifies core platform files across the frontend, API, auth, and shared infrastructure. The changes introduce a critical billing/auth regression, severe iframe sandbox security vulnerabilities, and unsustainable architectural bloat in several files exceeding 600–1200 lines. Overall, the PR requires significant restructuring, scope reduction, and security hardening before it can be considered for merge.
Notes
- The submission claims to be a community project but violates repository structure by modifying core monorepo files (auth, shared API routes, frontend pages, and base Docker images) instead of remaining isolated under
projects/. - A critical, out-of-scope change to core auth logic increases the free-tier limit 100x (5 to 500), which would severely impact platform usage controls and must be removed immediately.
- Security regressions in the ResultViewer component allow AI-generated HTML same-origin access via
allow-same-originiframe sandboxing, enabling XSS against parent-document cookies, localStorage, and DOM. - Multiple files far exceed maintainability limits (tool.py >1,200 lines, ResultViewer >600 lines) with tight coupling, dead code, inline render functions, and hardcoded API endpoints that should be decoupled into focused modules.
- Heavy browser and OCR dependencies are being forced onto the shared Tier 2 base image rather than containerized within the project directory, bloating infrastructure for all tools.
- Operational risks abound: proxy timeouts extended to one hour, client timeouts to ten minutes, and multi-agent pipelines that can exceed thirty minutes without request-level cancellation or orphan-process cleanup.
🚨 Verdict: Critical Issues Found | 24 inline comment(s) | 9 file(s) reviewed | ⏱️ 353s
Automated review by OxBot
There was a problem hiding this comment.
OxBot Review
This PR aims to improve screenshot-to-code accuracy by introducing a multi-agent pipeline with OCR anchoring, SSIM scoring, and visual editing capabilities. However, it fundamentally misaligns with the repository structure by presenting itself as a standalone community project while extensively modifying core application infrastructure—including shared API routes, authentication, types, and the base Docker image. The new Python tool has grown into a 1,179-line monolith with critical security vulnerabilities in postMessage handlers, severe performance and cost overhead from parallel LLM calls, and a 600-second timeout that risks gateway failures. Additionally, critical syntax errors, removal of documentation, and hardcoded infrastructure timeouts introduce immediate stability risks across the entire platform. The changes require significant architectural refactoring, security hardening, and path realignment before they can be considered for merge.
Notes
- Mismatch between submission type and scope: The PR claims to be a community project submission but modifies core framework files (app/src/*, services/python-tools/Dockerfile) instead of being isolated under projects/. This violates the stated submission requirements and risks destabilizing the entire platform.
- Monolithic architecture: Both the Python tool (~1,179 lines) and the ResultViewer component have become unmaintainable monoliths mixing image processing, LLM orchestration, browser automation, UI state, and postMessage protocols. They need modular decomposition into focused modules and sub-components.
- Security posture: The PR introduces client-side script injection via postMessage without strict origin validation, creating XSS and data exfiltration risks in the screenshot-to-code output. This must be hardened before any code handling user-generated HTML reaches production.
- Performance and cost concerns: The multi-agent pipeline (parallel coders + judge + healing loops + high-detail image uploads) is extremely expensive and slow. Combined with 600s timeouts in shared infrastructure, this will likely cause worker starvation, gateway timeouts, and excessive LLM spend.
- Infrastructure contamination: Changes to shared resources—the unified API route timeout, base Docker image bloat, core auth/types/hooks—impact every tool in the ecosystem. These cross-cutting modifications should be decoupled from the tool submission and evaluated separately.
- Code quality and maintainability: The PR leaves issue-tracker comments in source code, removes existing JSDoc documentation, introduces invalid JSON syntax, and employs conflicting React patterns. A cleanup pass is needed to meet production standards.
🚨 Verdict: Critical Issues Found | 24 inline comment(s) | 11 file(s) reviewed | ⏱️ 366s
Automated review by OxBot
OxBot ReviewError: GitHub API POST /repos/Cyborg-Network/Oxtools/pulls/11/reviews: 422 {"message":"Unprocessable Entity","errors":["Line could not be resolved"],"documentation_url":"https://docs.github.com/rest/pulls/re Automated review by OxBot |
There was a problem hiding this comment.
OxBot Review
This PR attempts to improve screenshot-to-code accuracy by adding OCR and computer vision dependencies, a multi-agent Python pipeline with SSIM-based healing loops, and an enhanced interactive ResultViewer. However, it is submitted as a standalone community project yet extensively modifies core platform infrastructure—including shared API routes, authentication, the generic ResultViewer, and the shared Tier 2 Docker base—violating self-containment requirements. The implementation introduces critical security vulnerabilities via iframe sandbox misconfiguration and wildcard postMessage origins, while the 1179-line Python tool and bloated shared component suffer from poor separation of concerns, dead code, and severe performance anti-patterns. Overall, the changes are ambitious but require significant architectural refactoring, security hardening, and scope correction before they can be merged.
Notes
- PR scope confusion: Submitted as a community project under projects/ but modifies core platform infrastructure (shared API routes, auth, base Docker image, generic UI components). Community tools must remain self-contained; platform changes require separate justification and review.
- Cross-cutting security risks: The combination of sandboxed iframe misconfiguration with srcdoc inheritance and wildcard postMessage fallbacks creates exploitable XSS/clickjacking vectors across the tool execution surface.
- Monolithic architecture: Both the Python backend (~1179 lines) and React frontend viewer have grown into monolithic modules mixing image processing, browser automation, API orchestration, and UI state, violating separation of concerns and maintainability standards.
- Shared infrastructure contamination: Heavy tool-specific dependencies (Tesseract, OpenCV, Playwright) and global timeout increases are being forced into shared Tier 2 infrastructure used by all tools, impacting build times, cache efficiency, and resource limits platform-wide.
- Dead code and unverified claims: The PR title claims accuracy improvements, yet the diff contains unused pipeline steps (spatial extractor) and primarily increases timeouts without documenting how duration relates to accuracy, suggesting incomplete implementation.
- Performance anti-patterns across stack: Fresh browser launches per rendering pass and React key-driven iframe remounts indicate a need for lifecycle reuse strategies that span both the Python service and the frontend viewer.
🚨 Verdict: Critical Issues Found | 19 inline comment(s) | 10 file(s) reviewed | ⏱️ 723s
Automated review by OxBot
There was a problem hiding this comment.
OxBot Review
This PR attempts to improve screenshot-to-code accuracy by adding a multi-agent consensus pipeline, OCR/image processing, visual editing, and configurable timeouts. However, it violates repository boundaries by modifying core shared infrastructure—including app/src/ framework code, the unified API route, generic UI components, and the shared Python runner Dockerfile—rather than submitting a self-contained community project under projects/. The changes introduce severe architectural bloat with the Python tool ballooning to ~1150 lines and the generic ResultViewer to ~800 lines, while also adopting risky patterns like 10-minute synchronous HTTP timeouts, heavy dependencies forced onto the shared Docker base image, and tool-specific API calls embedded in generic components.
Notes
- Boundary violation: Described as a community project submission, yet changes are scattered across core
app/src/and sharedservices/python-tools/instead of being isolated underprojects/[name]/, risking platform-wide destabilization. - Architectural bloat: Both the Python tool (~1150 lines) and the generic
ResultViewer(~800 lines) far exceed maintainable thresholds, violating separation of concerns by packing orchestration, rendering, UI scripting, and image processing into single modules. - Infrastructure contamination: The shared Python Dockerfile is permanently bloated with Tesseract, OpenCV, and Chromium for a single tool, significantly increasing image size and attack surface for all Python tools; prefer tool-specific images or multi-stage builds.
- Synchronous timeout anti-pattern: The stack introduces 10-minute synchronous HTTP timeouts (600,000ms) across the tool config, API route, and hook, which likely exceed serverless limits and create poor UX; an async job-polling or webhook model would be more robust.
- Generic component coupling: The supposedly reusable
ResultViewerand shared tool execution page are hardcoded with screenshot-to-code-specific logic (direct API fetches, image comparison, edit panels), undermining their reusability across the broader tool ecosystem. - Documentation regressions: Several core files (API route, auth module) have had extensive architectural JSDoc stripped without replacement, harming contributor onboarding and long-term maintainability.
Automated review by OxBot
There was a problem hiding this comment.
OxBot Review
This PR aims to improve screenshot-to-code accuracy by introducing a multi-agent pipeline with SSIM scoring, OCR, iterative refinement, and enhanced UI components including side-by-side comparison and visual editing. While the ambition and some individual improvements—such as configurable timeouts and dynamic image input detection—are valuable, the submission suffers from a fundamental structural mismatch: it is presented as an isolated community project but extensively modifies core application files and shared infrastructure rather than residing under projects/. The implementation is highly monolithic, with the Python tool ballooning to over 1,150 lines and React components growing unwieldy, while architectural choices like 10-minute synchronous timeouts and per-render Playwright launches introduce significant performance, cost, and reliability risks.
Notes
- The submission is described as a standalone community project but modifies core application configuration, shared API routes, and the global Python services base image rather than isolating changes under
projects/[project-name]/. This forces tool-specific dependencies (Tesseract, Playwright) onto every Tier 2 tool and breaks the repository's modular boundaries. - Both the frontend and backend implementations suffer from excessive module size and tight coupling. The Python tool, React result viewer, and tool page each exceed maintainability thresholds and should be split into focused sub-modules (e.g., rendering, scoring, prompts, UI panels).
- The combination of 600-second timeouts, multiple retries, and per-render browser launches creates a blocking architecture that is incompatible with standard HTTP infrastructure. The system should move toward asynchronous job processing with status polling or SSE for multi-minute workloads.
- The judge-model step processes full HTML outputs and high-resolution images for ranking, which is prohibitively expensive and likely to hit token limits. A lighter-weight evaluation strategy—such as structural diffing or truncated previews—is needed for viability.
- Whitespace-only changes across core configuration files introduce unnecessary diff noise and risk conflicting with the repository's existing Biome formatting rules. These should be reverted.
- Although types and hooks are updated to support per-tool timeouts, the unified API route implements a timeout equal to the platform
maxDurationwith no headroom, preventing graceful error serialization and risking opaque platform-level 504 responses.
Automated review by OxBot
ms-shashank
left a comment
There was a problem hiding this comment.
The Extension directory was completely removed right!. because i can still see there is a extension directory in the last commit, only the inside files were cleared i believe. So can you remove the complete extension directory itself.



Oxtools Submission
Project name: Arunmadhavan
Contributor: Arunmadhavan28
Demo: [Link to Loom or YouTube recording — required]
What does this tool do?
screenshot to code improved accuracy
Submission checklist
Check every box before requesting a review. Unchecked items will result in the PR being sent back.
Structure
projects/[my-project-name]/Required files
Dockerfileis present anddocker build .succeedsdocker-compose.ymlis present anddocker compose upstarts the appoxlo-manifest.jsonis present and all fields are filled in.env.examplelists every environment variable the project needs (with empty values)README.mdis present with setup instructions a reviewer can follow exactlySecurity
.envfile is not included in this PRgit grep -i "api_key"and found no leaksOxlo API
OXLO_API_KEYenvironment variableFor maintainers
docker build .succeeded locallydocker compose upran successfully and app is reachable