Skip to content

ADFA-3813 | Fix OCR image metadata parsing and refactor CV domain#1300

Open
jatezzz wants to merge 3 commits into
stagefrom
fix/ADFA-3813-ocr-metadata-parsing-experimental
Open

ADFA-3813 | Fix OCR image metadata parsing and refactor CV domain#1300
jatezzz wants to merge 3 commits into
stagefrom
fix/ADFA-3813-ocr-metadata-parsing-experimental

Conversation

@jatezzz
Copy link
Copy Markdown
Collaborator

@jatezzz jatezzz commented May 13, 2026

Description

Fixes an OCR issue where arbitrary metadata values for image files were parsed incorrectly (e.g., reading "image" as "im" or "im_age"), causing broken android:src references in the generated XML. Alongside this fix, the Computer Vision architecture was heavily refactored: complex domain logic was extracted from ComputerVisionViewModel and the monolithic repository into focused, single-responsibility UseCases (RunVisionUC, GenerateXmlUC, PrepareImageUC, etc.) and helper objects (DetectionScaler, LayoutTreeBuilder, TextAssociator).

Details

  • Added regex replacements in DrawableCleaner (ValueCleanersImpl.kt) to correct common OCR misinterpretations for the word "image".
  • Replaced ComputerVisionRepository with VisionRepository to purely handle ML operations.
  • Split monolithic ViewModel logic into isolated UseCases (GenerateXmlUC, ImportPlaceholderImageUC, PrepareImageUC, RemovePlaceholderImageUC, RunVisionUC).
  • Extracted UI mapping and XML tree building logic into standalone components (LayoutTreeBuilder, TextAssociator, and DetectionScaler).
Screen.Recording.2026-05-13.at.11.22.18.AM.mov

Ticket

ADFA-3813

Observation

The domain refactoring drastically reduces the bloat in the ComputerVisionViewModel, effectively decoupling the UI state management from ML detection and XML generation logic.

@jatezzz jatezzz requested review from a team, Daniel-ADFA and avestaadfa May 13, 2026 16:46
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@jatezzz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 23 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eefd1f11-ab17-4244-b333-cf1af0c7ea1a

📥 Commits

Reviewing files that changed from the base of the PR and between b667da3 and 9815c28.

📒 Files selected for processing (4)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionScaler.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/ValueCleanersImpl.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/usecase/PrepareImageUC.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt
📝 Walkthrough

Walkthrough

The PR refactors the computer-vision architecture by decomposing a monolithic repository and geometry processor into a simplified repository interface, domain use cases, and focused geometry utilities. It removes ComputerVisionRepository/ComputerVisionRepositoryImpl and replaces them with a leaner VisionRepository that delegates to model sources. It extracts layout-geometry responsibilities from LayoutGeometryProcessor into DetectionScaler, TextAssociator, and LayoutTreeBuilder. It introduces five new domain use cases (PrepareImageUC, RunVisionUC, GenerateXmlUC, ImportPlaceholderImageUC, RemovePlaceholderImageUC) that orchestrate the vision and XML-generation pipelines. The ViewModel now consumes these use cases instead of directly managing operations.

Changes

Repository Layer Refactoring

Layer / File(s) Summary
Simplified vision repository contract and implementation
VisionRepository.kt, VisionRepositoryImpl.kt
Replaces the removed ComputerVisionRepository/ComputerVisionRepositoryImpl with a cleaner interface that exposes only model initialization, widget detection, text recognition, and lifecycle methods (initModel(), detectWidgets(bitmap), recognizeText(bitmap), isInitialized(), release()). The new implementation delegates to YoloModelSource and OcrSource.

Geometry and Layout Utilities Decomposition

Layer / File(s) Summary
Detection scaling utility
DetectionScaler.kt
Extracts the bounding-box scaling logic from the removed LayoutGeometryProcessor into a focused singleton that converts YOLO-normalized boxes into pixel coordinates for a target Android resolution, including clamping and minimum-size enforcement.
Text-to-widget association utilities
TextAssociator.kt
Extracts text-association logic from the removed LayoutGeometryProcessor into a singleton with two public methods: assignTextToParents(...) for assigning OCR text to parent widgets via overlap threshold, and assignNearbyTextToWidgets(...) for assigning nearby text with vertical-alignment and proximity scoring. Includes widget labelability checks and widget-type-specific text cleaning.
Layout tree builder
LayoutTreeBuilder.kt
Extracts layout-tree construction from the removed LayoutGeometryProcessor into a singleton that groups scaled boxes into rows (using vertical overlap and center-alignment heuristics) and builds higher-level LayoutItem structures (radio/checkbox groups, horizontal rows, simple views).

Domain Use Cases Layer

Layer / File(s) Summary
Image preparation use case
PrepareImageUC.kt
New use case that decodes an image URI, applies EXIF-based rotation using TAG_ORIENTATION fallback, computes smart left/right boundary percentages via SmartBoundaryDetector, and returns a Result<PreparedImage> wrapping the bitmap and guide percentages. Handles cancellation semantics correctly.
Vision processing orchestration use case
RunVisionUC.kt
New use case that orchestrates YOLO detection, detection resolution via GenericBoxResolver, region OCR with left/right guide percentages, detection merging via DetectionMerger, filtering by bounds, and margin-annotation parsing. Emits progress updates via callback and returns Result<VisionResult> with detections and annotations. Preserves coroutine cancellation semantics.
XML generation use case
GenerateXmlUC.kt
New use case that wraps YoloToXmlConverter.generateXmlLayout(...) in runCatching, passing through detections, annotations, image selection map, dimensions, and forcing wrapInScroll = true. Returns Result<Pair<String, String>> for layout and strings XML.
Placeholder image import/removal use cases
ImportPlaceholderImageUC.kt, RemovePlaceholderImageUC.kt
New use cases that delegate to DrawableImportHelper: ImportPlaceholderImageUC imports a user-selected gallery image with a fallback name derived from placeholderId; RemovePlaceholderImageUC removes a drawable by resource name. Both return Result-wrapped outcomes.

Converter and XML Generator Updates

Layer / File(s) Summary
YoloToXmlConverter refactoring
YoloToXmlConverter.kt
Removes the LayoutGeometryProcessor dependency and delegates scaling and text-association steps to DetectionScaler and TextAssociator. The constructor now takes only annotationMatcher and xmlGenerator. Methods scaleDetections, associateTextToWidgets, and extractCanvasTags are updated to use the new utility objects instead of the geometry processor.
AndroidXmlGenerator refactoring
AndroidXmlGenerator.kt
Removes the LayoutGeometryProcessor dependency and updates the constructor. The buildXml method now derives layout items via LayoutTreeBuilder.buildLayoutTree(boxes) instead of calling the injected geometry processor.

ViewModel and DI Integration

Layer / File(s) Summary
ComputerVisionViewModel refactoring
ComputerVisionViewModel.kt
Updates constructor to inject VisionRepository and five use cases (PrepareImageUC, RunVisionUC, GenerateXmlUC, ImportPlaceholderImageUC, RemovePlaceholderImageUC) instead of repository implementation details and UI helpers. Image loading now delegates to PrepareImageUC; detection runs through RunVisionUC with progress callbacks; XML generation and export use GenerateXmlUC; placeholder interactions use the import/remove use cases. Event routing, error handling, and resource cleanup are updated accordingly. Method name changed from initializeModel() to initModel() and releaseResources() to release().
DI module wiring
ComputerVisionModule.kt
Updates Koin bindings to register VisionRepository (backed by VisionRepositoryImpl with OcrSource), explicitly registers OcrSource, RegionOcrProcessor, and GenericBoxResolver, and adds singleton registrations for all five use cases. The ComputerVisionViewModel factory is updated to inject the use cases instead of helpers.

Minor Logic Fix

Layer / File(s) Summary
DrawableCleaner normalization
ValueCleanersImpl.kt
Adds a post-cleanup normalization step to replace the substring im_age with image before producing the final @drawable/... value, improving OCR-based resource name resolution.

Sequence Diagram(s)

sequenceDiagram
    participant ViewModel as ComputerVisionViewModel
    participant PrepareUC as PrepareImageUC
    participant RunVisionUC as RunVisionUC
    participant VisionRepo as VisionRepository
    participant YoloSource as YoloModelSource
    participant OCRSource as OcrSource
    participant GenerateUC as GenerateXmlUC
    participant Converter as YoloToXmlConverter

    ViewModel->>PrepareUC: invoke(uri)
    activate PrepareUC
    PrepareUC->>PrepareUC: decode & rotate image
    PrepareUC->>PrepareUC: compute left/right boundaries
    PrepareUC-->>ViewModel: Result<PreparedImage>
    deactivate PrepareUC

    ViewModel->>RunVisionUC: invoke(bitmap, leftPct, rightPct, onProgress)
    activate RunVisionUC
    RunVisionUC->>VisionRepo: detectWidgets(bitmap)
    activate VisionRepo
    VisionRepo->>YoloSource: runInference(bitmap)
    YoloSource-->>VisionRepo: List<DetectionResult>
    VisionRepo-->>RunVisionUC: Result<List<DetectionResult>>
    deactivate VisionRepo
    
    RunVisionUC->>VisionRepo: recognizeText(bitmap)
    activate VisionRepo
    VisionRepo->>OCRSource: recognize(bitmap)
    OCRSource-->>VisionRepo: List<TextBlock>
    VisionRepo-->>RunVisionUC: Result<List<TextBlock>>
    deactivate VisionRepo
    
    RunVisionUC->>RunVisionUC: merge detections & parse annotations
    RunVisionUC-->>ViewModel: Result<VisionResult>
    deactivate RunVisionUC

    ViewModel->>GenerateUC: invoke(detections, annotations, ...)
    activate GenerateUC
    GenerateUC->>Converter: generateXmlLayout(...)
    activate Converter
    Converter->>Converter: scale, associate text, build layout
    Converter-->>GenerateUC: Pair<layoutXml, stringsXml>
    deactivate Converter
    GenerateUC-->>ViewModel: Result<Pair<String, String>>
    deactivate GenerateUC
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • appdevforall/CodeOnTheGo#1185: Directly overlaps with refactoring of YoloToXmlConverter's text/annotation matching and layout generation flow.
  • appdevforall/CodeOnTheGo#1171: Main PR builds on SmartBoundaryDetector and guide-percentage logic for left/right boundary detection in PrepareImageUC.
  • appdevforall/CodeOnTheGo#887: Main PR's handling of left/right guide percentages in vision pipeline directly connects to UI changes that emit UpdateGuides events.

Suggested reviewers

  • Daniel-ADFA
  • avestaadfa
  • hal-eisen-adfa

Poem

🐰 A processor once grand, now split into pieces small,
Detections scale, text finds home, builders heed the call,
Use cases guide the flow, the ViewModel stays light,
Separated concerns dance—refactored just right!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: an OCR metadata parsing fix and a comprehensive refactoring of the Computer Vision domain architecture.
Description check ✅ Passed The description comprehensively covers the OCR fix and architectural refactoring, explaining both the problem and the solution with specific component names and improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ADFA-3813-ocr-metadata-parsing-experimental

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionScaler.kt`:
- Around line 26-27: Integer division in DetectionScaler.kt causes normW and
normH to lose precision; cast operands to floating-point before dividing (e.g.,
convert (rect.right - rect.left) and sourceWidth/sourceHeight to Float/Double)
so normalization uses floating-point math, and ensure normW/normH types match
(Float/Double) where they are used; update the normalization lines referencing
rect, normW, normH, sourceWidth and sourceHeight accordingly.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/ValueCleanersImpl.kt`:
- Around line 167-168: The current cleanup replaces "im_age" but leaves the OCR
variant "im" producing "@drawable/im"; in ValueCleanersImpl (the cleaned ->
finalCleaned flow) update the replacement logic to also normalize standalone
"im" to "image" (use a word-boundary or equivalent check so you don't
accidentally change substrings) before returning "@drawable/$finalCleaned";
ensure you apply this to the same cleaned/finalCleaned variable used in the
return path so empty-value fallback still returns rawValue.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/usecase/PrepareImageUC.kt`:
- Around line 45-51: The code in PrepareImageUC.kt that computes orientation
currently catches all Exceptions; narrow this to specific expected exceptions
(e.g., catch (ioe: IOException)) around the
contentResolver.openInputStream(uri)?.use { ... } / ExifInterface(...) call so
only IO problems are swallowed and other unexpected errors still surface;
optionally add an additional catch for SecurityException if permission issues
are possible. Ensure the fallback to ExifInterface.ORIENTATION_NORMAL remains in
the catch block.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt`:
- Line 70: When handling ComputerVisionEvent.UpdateGuides in the ViewModel,
sanitize the incoming event.leftPct and event.rightPct before storing: clamp
both values to the [0f, 1f] range, then order them so leftGuidePct <=
rightGuidePct, and finally call _uiState.update with the normalized values
(leftGuidePct and rightGuidePct) to prevent crossed or out-of-range guides from
affecting downstream filtering and annotation parsing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ce97d103-0463-426a-8c1d-8d278e025120

📥 Commits

Reviewing files that changed from the base of the PR and between b85bbc2 and b667da3.

📒 Files selected for processing (18)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepository.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepositoryImpl.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/VisionRepository.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/VisionRepositoryImpl.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/di/ComputerVisionModule.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionScaler.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/LayoutGeometryProcessor.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/LayoutTreeBuilder.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/TextAssociator.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/ValueCleanersImpl.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/usecase/GenerateXmlUC.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/usecase/ImportPlaceholderImageUC.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/usecase/PrepareImageUC.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/usecase/RemovePlaceholderImageUC.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/usecase/RunVisionUC.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidXmlGenerator.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt
💤 Files with no reviewable changes (3)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepository.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepositoryImpl.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/LayoutGeometryProcessor.kt

Encapsulate guide limits, narrow EXIF exceptions, and fix 'im' drawable regex.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant