diff --git a/.jules/bolt.md b/.jules/bolt.md index 541c0ab51..c9622329d 100644 --- a/.jules/bolt.md +++ b/.jules/bolt.md @@ -17,3 +17,7 @@ ## 2025-05-19 - File Discovery Allocations **Learning:** In `discover_importing_files`, `WalkBuilder` results were being converted to `PathBuf` via `.map(|e| e.into_path())` *before* filtering. This caused allocations for every single file in the workspace (including excluded files and directories). **Action:** Filter `ignore::DirEntry` directly using `entry.file_type()` and `entry.path()` before mapping to `PathBuf`. This avoids allocations for non-matching files. + +## 2024-05-31 - Heuristic Character Loop Early Exits +**Learning:** Heuristic functions that count characters or bytes (e.g., `is_likely_code_text` checking for binary files) iterate over sample chunks. When the threshold for a boolean result (like returning `false` for too many non-printable characters) is reached early in the sample, continuing to iterate over the rest of the chunk wastes CPU cycles. This is especially impactful for mismatch cases (large binary files). +**Action:** When writing loop-based threshold checks, always add an early exit condition inside the loop (e.g., `if count >= threshold { return false; }`) to avoid scanning the entire chunk unnecessarily. diff --git a/crates/mill-services/src/services/reference_updater/detectors/generic.rs b/crates/mill-services/src/services/reference_updater/detectors/generic.rs index bea6dc076..d82306d3c 100644 --- a/crates/mill-services/src/services/reference_updater/detectors/generic.rs +++ b/crates/mill-services/src/services/reference_updater/detectors/generic.rs @@ -519,12 +519,18 @@ fn is_likely_code_text(content: &str) -> bool { let sample = &content.as_bytes()[..sample_len]; let mut non_printable = 0usize; for &b in sample { - let is_text = b == b'\n' || b == b'\r' || b == b'\t' || (b >= 0x20 && b <= 0x7e); + let is_text = b == b'\n' || b == b'\r' || b == b'\t' || (0x20..=0x7e).contains(&b); if !is_text { non_printable += 1; + // ⚡ Bolt: Early exit to avoid scanning the entire chunk for binary files. + // This is especially beneficial for mismatch cases as we stop immediately + // once the threshold for failure is reached. + if non_printable * 20 >= sample_len { + return false; + } } } - non_printable * 20 < sample_len + sample_len > 0 } fn is_obviously_irrelevant_extension(ext: &str) -> bool {