Skip to content

Fix wave algorithm boundary conditions to prevent invalid alignments#25

Open
unavailable-2374 wants to merge 4 commits into
thegenemyers:mainfrom
unavailable-2374:main
Open

Fix wave algorithm boundary conditions to prevent invalid alignments#25
unavailable-2374 wants to merge 4 commits into
thegenemyers:mainfrom
unavailable-2374:main

Conversation

@unavailable-2374
Copy link
Copy Markdown

Root cause analysis:

  • Wave algorithm uses diagonal coordinate system where k = x - y
  • For valid coordinates: x = (anti+k)/2 >= 0 and y = (anti-k)/2 >= 0
  • This requires k to be in range [-anti, anti]
  • Original code only checked hgh <= anti (y >= 0), not low >= -anti (x >= 0)

Fixes applied:

  1. Entry point validation (Local_Alignment, Wrap_Around_Alignment, Find_Extension):

    • Add check: while (((anti+low)>>1) < 0) low += 1
    • Constrain minp to -anti (prevents wave expansion into x < 0 region)
    • Constrain maxp to anti (prevents wave expansion into y < 0 region)
    • Return error if no valid diagonals remain (low > hgh)
  2. Internal bounds checking (all 6 wave functions):

    • Add check: if (x < 0 || x < k) mark diagonal as dead
    • Applied to both 0-wave initialization and main wave loops
  3. Safety checks (forward_wave, reverse_wave):

    • Add check: if (avail == 0) return error
    • Catches case where all diagonals fail bounds checking
  4. Early termination (all wave expansion loops):

    • Add check: if (hgh < low) break
    • Prevents infinite loops when all diagonals are trimmed

This multi-layer fix prevents invalid coordinates from being computed, rather than just detecting and skipping them after the fact.

unavailable-2374 and others added 4 commits January 28, 2026 18:23
Root cause analysis:
- Wave algorithm uses diagonal coordinate system where k = x - y
- For valid coordinates: x = (anti+k)/2 >= 0 and y = (anti-k)/2 >= 0
- This requires k to be in range [-anti, anti]
- Original code only checked hgh <= anti (y >= 0), not low >= -anti (x >= 0)

Fixes applied:

1. Entry point validation (Local_Alignment, Wrap_Around_Alignment, Find_Extension):
   - Add check: while (((anti+low)>>1) < 0) low += 1
   - Constrain minp to -anti (prevents wave expansion into x < 0 region)
   - Constrain maxp to anti (prevents wave expansion into y < 0 region)
   - Return error if no valid diagonals remain (low > hgh)

2. Internal bounds checking (all 6 wave functions):
   - Add check: if (x < 0 || x < k) mark diagonal as dead
   - Applied to both 0-wave initialization and main wave loops

3. Safety checks (forward_wave, reverse_wave):
   - Add check: if (avail == 0) return error
   - Catches case where all diagonals fail bounds checking

4. Early termination (all wave expansion loops):
   - Add check: if (hgh < low) break
   - Prevents infinite loops when all diagonals are trimmed

This multi-layer fix prevents invalid coordinates from being computed,
rather than just detecting and skipping them after the fact.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Skip alignments where coordinates exceed sequence bounds:
- path->abpos < 0 or path->aepos > aln->alen
- path->bbpos < 0 or path->bepos > aln->blen
- start >= end for either sequence

This handles edge cases where wave algorithm produces out-of-bounds
coordinates, preventing "Subrange out of bounds" errors in Get_Contig_Piece.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
During each wavefront expansion step, calculate the effective diagonal
range based on the current anti-diagonal position:

Forward wave: curr_anti = mida + 2*(dif+1)
Reverse wave: curr_anti = mida - 2*(dif+1)

Sequence length constraints:
- seq_minp = curr_anti - 2*blen  (ensures y <= blen)
- seq_maxp = 2*alen - curr_anti  (ensures x <= alen)

Take the stricter of user constraints and sequence constraints:
- eff_minp = max(minp, seq_minp)
- eff_maxp = min(maxp, seq_maxp)

Applied to all 6 wavefront functions:
- forward_wave, reverse_wave
- forward_wrap, reverse_wrap
- forward_extend, reverse_extend

Also restored simple boundary check: if (x < 0 || x < k)

This dynamic approach is more precise than static constraints because
the valid diagonal range changes as the wavefront expands.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: Compress_TraceTo8(ovl, 0) silently truncates trace values
exceeding 255 to uint8. With TSPACE=100 <= TRACE_XOVR=125, traces use
8-bit storage. When a 100bp A-interval maps to >255bp on B (large
insertion), the B_consumed value overflows (e.g. 300 -> 44). This
corrupts the trace, causing ALNtoPAF's Compute_Trace_PTS/iter_np to
fail with "Bad alignment between trace points". The error is
input-order dependent because swapping A/B changes which sequence
accumulates large B_consumed values.

Fixes:
- FastGA.c: Add trace overflow detection before Compress_TraceTo8;
  skip alignments with any trace value > 255
- FastGA.c: Fix PATH1/ROOT1 -> PATH2/ROOT2 for GDB extension
  detection of genome2
- ALNtoPAF.c: Remove incorrect alast assignment in coordinate
  validation skip that caused stale sequence data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant