From 3ed606ed8cf854c6ddcf51d6e55db65c46311567 Mon Sep 17 00:00:00 2001
From: "google-labs-jules[bot]"
 <161369871+google-labs-jules[bot]@users.noreply.github.com>
Date: Thu, 11 Jun 2026 10:08:47 +0000
Subject: [PATCH] =?UTF-8?q?=E2=9A=A1=20Bolt:=20Add=20early=20exit=20to=20i?=
 =?UTF-8?q?s=5Flikely=5Fcode=5Ftext?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds an early exit inside the byte scanning loop of `is_likely_code_text`. Once the threshold of non-printable characters is reached (indicating the file is likely binary), the function now immediately returns `false` instead of scanning the rest of the sample chunk. This saves CPU cycles, particularly when processing large binary files during file search operations.

Also replaces manual range check `b >= 0x20 && b <= 0x7e` with `(0x20..=0x7e).contains(&b)` to silence `clippy::manual_range_contains` warnings.

Co-authored-by: mudcube <101564+mudcube@users.noreply.github.com>
---
 .jules/bolt.md                                         |  4 ++++
 .../services/reference_updater/detectors/generic.rs    | 10 ++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/.jules/bolt.md b/.jules/bolt.md
index 541c0ab51..c9622329d 100644
--- a/.jules/bolt.md
+++ b/.jules/bolt.md
@@ -17,3 +17,7 @@
 ## 2025-05-19 - File Discovery Allocations
 **Learning:** In `discover_importing_files`, `WalkBuilder` results were being converted to `PathBuf` via `.map(|e| e.into_path())` *before* filtering. This caused allocations for every single file in the workspace (including excluded files and directories).
 **Action:** Filter `ignore::DirEntry` directly using `entry.file_type()` and `entry.path()` before mapping to `PathBuf`. This avoids allocations for non-matching files.
+
+## 2024-05-31 - Heuristic Character Loop Early Exits
+**Learning:** Heuristic functions that count characters or bytes (e.g., `is_likely_code_text` checking for binary files) iterate over sample chunks. When the threshold for a boolean result (like returning `false` for too many non-printable characters) is reached early in the sample, continuing to iterate over the rest of the chunk wastes CPU cycles. This is especially impactful for mismatch cases (large binary files).
+**Action:** When writing loop-based threshold checks, always add an early exit condition inside the loop (e.g., `if count >= threshold { return false; }`) to avoid scanning the entire chunk unnecessarily.
diff --git a/crates/mill-services/src/services/reference_updater/detectors/generic.rs b/crates/mill-services/src/services/reference_updater/detectors/generic.rs
index bea6dc076..d82306d3c 100644
--- a/crates/mill-services/src/services/reference_updater/detectors/generic.rs
+++ b/crates/mill-services/src/services/reference_updater/detectors/generic.rs
@@ -519,12 +519,18 @@ fn is_likely_code_text(content: &str) -> bool {
     let sample = &content.as_bytes()[..sample_len];
     let mut non_printable = 0usize;
     for &b in sample {
-        let is_text = b == b'\n' || b == b'\r' || b == b'\t' || (b >= 0x20 && b <= 0x7e);
+        let is_text = b == b'\n' || b == b'\r' || b == b'\t' || (0x20..=0x7e).contains(&b);
         if !is_text {
             non_printable += 1;
+            // ⚡ Bolt: Early exit to avoid scanning the entire chunk for binary files.
+            // This is especially beneficial for mismatch cases as we stop immediately
+            // once the threshold for failure is reached.
+            if non_printable * 20 >= sample_len {
+                return false;
+            }
         }
     }
-    non_printable * 20 < sample_len
+    sample_len > 0
 }
 
 fn is_obviously_irrelevant_extension(ext: &str) -> bool {