Skip to content

Conversation

@laurensWe
Copy link
Member

@laurensWe laurensWe commented Jan 12, 2026

  • Translate step 2 and step 3 of the matlab function PreprocessData.m
  • This step will rotate the images to be vertical and then extract the features
  • Added tests (both unit as integration tests)

@laurensWe laurensWe changed the title WIP Feature/step 3 Implement the python implementation of PreprocessData (Step 2 and Step 3) Jan 13, 2026
Copy link
Collaborator

@SimoneAriens SimoneAriens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress! Most pressing changes before merging: a cleanup of (almost) duplicate files is due and the main function should take a Mark and return two Marks (see endpoint overview). This way it can be neatly fitted into the pipelines

@github-actions
Copy link

Diff Coverage

Diff: origin/main..HEAD, staged and unstaged changes

  • packages/scratch-core/src/conversion/filter/init.py (100%)
  • packages/scratch-core/src/conversion/filter/gaussian.py (86.2%): Missing lines 188-189,192,194-198
  • packages/scratch-core/src/conversion/filter/mark_filters.py (100%)
  • packages/scratch-core/src/conversion/filter/regression.py (90.5%): Missing lines 122,309,312,327-328,330-331,333-334
  • packages/scratch-core/src/conversion/filter/utils.py (88.9%): Missing lines 28
  • packages/scratch-core/src/conversion/mask.py (100%)
  • packages/scratch-core/src/conversion/preprocess_impression/parameters.py (100%)
  • packages/scratch-core/src/conversion/preprocess_impression/preprocess_impression.py (100%)
  • packages/scratch-core/src/conversion/preprocess_striation/init.py (100%)
  • packages/scratch-core/src/conversion/preprocess_striation/alignment.py (78.8%): Missing lines 62-63,66-67,74,151-153,156-157,164,167-169,172,195,199
  • packages/scratch-core/src/conversion/preprocess_striation/parameters.py (100%)
  • packages/scratch-core/src/conversion/preprocess_striation/pipeline.py (86.7%): Missing lines 78-82,91
  • packages/scratch-core/src/conversion/preprocess_striation/shear.py (97.2%): Missing lines 42

Summary

  • Total: 357 lines
  • Missing: 42 lines
  • Coverage: 88%

packages/scratch-core/src/conversion/filter/gaussian.py

Lines 184-202

  184     cropped_mask = mask
  185 
  186     if cut_borders_after_smoothing:
  187         # Calculate sigma for border cropping
! 188         sigma = cutoff_to_gaussian_sigma(cutoff, scan_image.scale_x)
! 189         sigma_int = int(ceil(sigma))
  190 
  191         # Check if there are any masked (invalid) regions
! 192         has_masked_regions = np.any(~mask)
  193 
! 194         if has_masked_regions:
! 195             cropped_data, cropped_mask = remove_zero_border(cropped_data, mask)
! 196         elif sigma_int > 0 and scan_image.height > 2 * sigma_int:
! 197             cropped_data = cropped_data[sigma_int:-sigma_int, :]
! 198             cropped_mask = mask[sigma_int:-sigma_int, :]
  199 
  200     return cropped_data, cropped_mask
  201 

packages/scratch-core/src/conversion/filter/regression.py

Lines 118-126

  118         pad_y = len(kernel_y) // 2
  119         pad_x = len(kernel_x) // 2
  120         padded = np.pad(data, ((pad_y, pad_y), (pad_x, pad_x)), mode="symmetric")
  121     else:
! 122         raise ValueError(
  123             f"Padding mode '{mode}' is not supported. Use 'constant' or 'symmetric'."
  124         )
  125 
  126     # Convolve: Y-direction then X-direction

Lines 305-316

  305     try:
  306         # Batch solve is much faster
  307         solutions = np.linalg.solve(A_valid, b_valid)
  308         result[valid_indices] = solutions[:, 0, 0]  # c0 is the smoothed value
! 309     except np.linalg.LinAlgError:
  310         # Pass the tuple as is, but we ensure the recipient expects a 2D tuple.
  311         # This modifies `result` in place.
! 312         _solve_fallback_lstsq(result, lhs_matrix, rhs_prepared, valid_indices)
  313 
  314     return result
  315 

Lines 323-335

  323     ],  # Use ellipsis to allow variadic tuples of index arrays
  324 ) -> None:
  325     """Robust fallback solver using Least Squares for difficult pixels."""
  326     # We explicitly extract y and x to make the 2D logic clear to the reader
! 327     y_idx, x_idx = indices[0], indices[1]
! 328     n_pixels = len(y_idx)
  329 
! 330     for i in range(n_pixels):
! 331         y, x = y_idx[i], x_idx[i]
  332         # lstsq returns (solution, residuals, rank, singular_values)
! 333         sol = np.linalg.lstsq(lhs[y, x], rhs[y, x], rcond=None)[0]
! 334         result_array[y, x] = sol[0, 0]

packages/scratch-core/src/conversion/filter/utils.py

Lines 24-32

  24     valid_data = mask & ~np.isnan(data)
  25 
  26     if not np.any(valid_data):
  27         # No valid data at all - return empty arrays
! 28         return (
  29             np.array([]).reshape(0, data.shape[1]),
  30             np.array([], dtype=bool).reshape(0, data.shape[1]),
  31         )

packages/scratch-core/src/conversion/preprocess_striation/alignment.py

Lines 58-71

  58     )
  59 
  60     # Apply shear transformation if angle is non-zero
  61     if not np.isclose(total_angle, 0.0, atol=1e-09):
! 62         total_angle_rad = np.radians(total_angle)
! 63         result_data = shear_data_by_shifting_profiles(
  64             scan_image.data, total_angle_rad, cut_y_after_shift
  65         )
! 66         if mask is not None:
! 67             result_mask = (
  68                 shear_data_by_shifting_profiles(
  69                     mask, total_angle_rad, cut_y_after_shift
  70                 )
  71                 > 0.5

Lines 70-78

  70                 )
  71                 > 0.5
  72             )
  73         else:
! 74             result_mask = None
  75     else:
  76         result_data = scan_image.data
  77         result_mask = mask

Lines 147-161

  147 
  148         if np.isnan(current_angle):
  149             current_angle = 0.05
  150         else:
! 151             total_angle += current_angle
! 152             total_angle_rad = np.radians(total_angle)
! 153             data_tmp = shear_data_by_shifting_profiles(
  154                 scan_image.data, total_angle_rad, cut_y_after_shift
  155             )
! 156             if mask is not None:
! 157                 mask_tmp = (
  158                     shear_data_by_shifting_profiles(
  159                         mask, total_angle_rad, cut_y_after_shift
  160                     )
  161                     > 0.5

Lines 160-176

  160                     )
  161                     > 0.5
  162                 )
  163             else:
! 164                 mask_tmp = None
  165 
  166             # Check if stuck (same angle as previous iteration)
! 167             if current_angle == previous_angle:
! 168                 break
! 169             previous_angle = current_angle
  170     else:
  171         # Max iterations reached without convergence
! 172         return 0.0
  173 
  174     return total_angle
  175 

Lines 191-203

  191     """
  192     # Determine subsampling factor
  193     sub_samp = subsampling_factor
  194     if scan_image.scale_x < 1e-6:
! 195         sub_samp = round(1e-6 / scan_image.scale_x) * subsampling_factor
  196 
  197     # Resample data (only x-dimension, matching MATLAB's resample function)
  198     if sub_samp > 1 and scan_image.width // sub_samp >= 2:
! 199         scan_image, mask = resample_scan_image_and_mask(
  200             scan_image,
  201             mask,
  202             factors=(sub_samp, 1),
  203             only_downsample=True,

packages/scratch-core/src/conversion/preprocess_striation/pipeline.py

Lines 74-86

  74         scale_x = aligned_scan.scale_x
  75         scale_y = aligned_scan.scale_y
  76     else:
  77         # Line profile case (no alignment needed)
! 78         data_aligned = data_filtered
! 79         mask_aligned = mask_filtered
! 80         total_angle = 0.0
! 81         scale_x = scan_image.scale_x
! 82         scale_y = scan_image.scale_y
  83 
  84     # Propagate NaN to adjacent pixels to match MATLAB's asymmetric NaN handling
  85     data_aligned = propagate_nan(data_aligned)

Lines 87-95

  87     # Extract profile: apply mask and compute mean/median along rows
  88     if mask_aligned is not None:
  89         data_for_profile = np.where(mask_aligned, data_aligned, np.nan)
  90     else:
! 91         data_for_profile = data_aligned
  92 
  93     profile = (
  94         np.nanmean(data_for_profile, axis=1)
  95         if params.use_mean

packages/scratch-core/src/conversion/preprocess_striation/shear.py

Lines 38-46

  38     :returns: The sheared depth data.
  39     """
  40     # Skip shear for angles smaller than ~0.1° (0.00175 rad)
  41     if abs(angle_rad) <= 0.00175:
! 42         return depth_data.astype(np.floating).copy()
  43 
  44     height, width = depth_data.shape
  45     center_x, center_y = width / 2, height / 2

@github-actions
Copy link

Code Coverage

Package Line Rate Branch Rate Health
. 96% 88%
comparators 100% 100%
container_models 99% 100%
conversion 100% 96%
conversion.export 100% 100%
conversion.filter 90% 80%
conversion.leveling 100% 100%
conversion.leveling.solver 100% 75%
conversion.preprocess_impression 99% 91%
conversion.preprocess_striation 86% 60%
extractors 96% 75%
parsers 98% 67%
parsers.patches 89% 60%
preprocessors 95% 75%
processors 100% 100%
renders 98% 50%
utils 91% 75%
Summary 95% (1536 / 1610) 79% (182 / 230)

Minimum allowed line rate is 50%

# Crop to mask bounding box
if result_mask is not None:
result_data = crop_to_mask(
np.asarray(result_data, dtype=np.float64), result_mask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't result_data already an array of floats?

result_data = crop_to_mask(
np.asarray(result_data, dtype=np.float64), result_mask
)
y_slice, x_slice = _determine_bounding_box(result_mask)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use crop_to_mask here as well?

)

# Build meta_data with mask and total_angle
aligned_meta_data = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add the params here and for the profile:
mark.meta_data.update(**asdict(params))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants