feat: initial shape-scan implementation#1
Merged
Conversation
Adds entropy and topological-shape file scanner: - Shannon entropy (whole-file + sliding window) in bits/byte - Byte-bigram transition graph with density, joint/conditional entropy, per-row entropy stats, and stable structural fingerprint - Section-aware analysis for ELF, PE, and Mach-O via goblin - CLI with scan/shape/entropy subcommands (text/json/markdown output) - Heuristic risk score with documented weights and small-file dampening - 12 unit tests covering entropy, shape, scoring, and I/O - GitHub Actions CI (fmt + clippy + test on linux/macos/windows)
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Initial implementation of
shape-scan, a Rust CLI that measures the entropy and topological "shape" of files to flag suspicious binaries (packed/encrypted payloads).What it computes per file:
|V|and edges|E||E| / 65 536H(b_{i+1} | b_i)(bits/byte)goblin; falls back to a single<file>pseudo-section for unknown formats.[0.0, 1.0]with a transparent, additive weighting (documented in the README) and alow/medium/highbucket.CLI:
shape-scan scan <paths...>— full scan; supports-rrecursion,--min-risk,--format text|json|markdown,-jparallel workers (rayon),--max-size-mib.shape-scan shape <path>— only the topology report.shape-scan entropy <path> [--window N]— only the entropy profile.0clean,1at least one high-risk file,2error.Honest framing: the README is explicit that this is a triage signal, not a malware verdict — entropy/shape heuristics are well-known and sophisticated malware can be tuned to evade them.
Local verification:
CI runs
cargo fmt --check,cargo clippy -D warnings,cargo test, andcargo build --releaseon Linux, macOS, and Windows. 12 unit tests cover entropy, shape, scoring, and I/O.Review & Testing Checklist for Human
src/scan.rs::score(and the matching table in the README) match the trade-offs you want — they're tunable knobs and worth a glance.shape-scan scan /path/to/binary.exe --format json | jq '.[0].sections'.dtolnay/rust-toolchain@stable, so a stable-Rust regression in a transitive dep would surface here.Cargo.tomlis ready (license, description, keywords) but you'll want to claim the name and bump the version before publishing.Notes
Cargo.lockis committed because this is a binary crate; remove it later if you decide to shipshape-scanonly as a library.ghintegration token couldn't set the default branch tomain, but themainref was created via the GitHub API, so the PR base is correct. You may want to confirmmainis the default branch in repo settings.Link to Devin session: https://app.devin.ai/sessions/65678b8d19d74d5b97392a05c0f7d416
Requested by: @DimaMenetro