Skip to content

Conversation

@zimonitrome
Copy link

@zimonitrome zimonitrome commented Sep 30, 2025

Summary

This PR adds optional file-node scaling by file size via -S / --scale-by-file-size.

When -S is not set, existing Gource behavior is unchanged.

Related issues: #91 #54 #147 #223

Screencast.from.2026-02-10.17.22.23_trimmed.webm

What changed

  • Added file-size-based node scaling (-S / --scale-by-file-size).
  • Added smooth interpolation when file size changes (so node size transitions are animated).
  • Added a dedicated packing solver for scaled nodes to reduce wobble and avoid persistent overlap in dense directories.
  • Added deterministic small-cluster shaping to prevent line/chain layouts for small groups of files.
  • Added optional hover text for file size (--show-file-size-on-hover).

Git size lookup changes

  • Reworked Git size lookup to index blob sizes up front using:
    • git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'
  • During parsing, file sizes are looked up from this in-memory index (no per-event subprocess calls).
  • Git raw log command now uses --abbrev=40 so blob hashes match the prebuilt index.
  • Existing log file compatibility:
    • If scaling is OFF, older/partial raw lines continue to parse.
    • If scaling is ON and required hash metadata is missing, parsing fails with a descriptive error.

Custom log format

Custom log supports optional file_size as a 6th field:

timestamp|username|type|file|colour|file_size

file_size is used for A/M actions.

CLI/options added

Primary feature flag:

  • -S, --scale-by-file-size

Advanced tuning options:

  • --file-scale
  • --dir-spacing
  • --file-gravity
  • --file-repulsion
  • --show-file-size-on-hover

Defaults are tuned so -S works reasonably without extra tuning.

Notes

Current scaling uses a logarithmic mapping for readability across large size ranges.

@acaudwell
Copy link
Owner

Cool looking feature. Thanks for the detailed overview of the changes.

@zimonitrome
Copy link
Author

zimonitrome commented Oct 4, 2025

I have now had some time to test the PR a bit more and I think the physics in particular needs some more tweaking before it could be deemed "stable". Examples:

  • Adding many files in a single dir causes nodes to constantly "bounce" in and out of the cluster.
  • Big files in large clusters sometimes overlap.
  • Small files in large clusters sometimes never separate.

That being said, it still works fine for most common projects I have tested it on.

I have some ideas on how to stabilize it somewhat. Like adding adding an outward force or a more "spring like" gravity model.

For now I made a few changes to the PR:

  • Reverted the regex (I used an old compiler)
  • Fixed a bug in file size getting. Previously it only worked for cwd.

I will try to improve it and report back.

if (gGourceSettings.scale_by_file_size) {
for (RFile* f : files) {
if (!f->isHidden()) {
float r = f->getSize() / 2.0f;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this sets the radius to file size. It means the actual dot size will be proportional to file size squared. The area of the dots will not be proportional to file size.


if (gGourceSettings.scale_by_file_size && status != "D") {
char cmd_buff[2048];
int written = snprintf(cmd_buff, 2048, "git --git-dir=%s/.git --work-tree=%s cat-file -s %s", m_repository_path.c_str(), m_repository_path.c_str(), dst_blob.c_str());
Copy link
Owner

@acaudwell acaudwell Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to cause a performance issue doing this here as it blocks the UI while fetching the blob. Also this wont work if logfile is a file and not a directory. You can also see it block when you move the mouse cursor of the interactive timeline as it peeks at the part of the log under the cursor to get the timestamp.

I think for performance instead what would be good is we get all the file blob sizes up front (maybe just get every blob file size in the repo at once) right after we generate the git log, puts them into a data structure, and then we can look up into it here.

Copy link
Author

@zimonitrome zimonitrome Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented. I removed per-event git cat-file calls and replaced them with a one-time blob-size index build right after log generation, then O(1) lookups during parsing. Should avoid timeline-hover/UI stalls... I think. Maybe needs more testing.

@zimonitrome
Copy link
Author

zimonitrome commented Feb 10, 2026

I updated the branch to make the "physics layout" more stable and took in some some review feedback.

Main improvements since earlier revisions:

  • Replaced per-file blob size subprocess calls with a one-time Git blob index build (git cat-file --batch-all-objects ...) and in-memory lookups.
  • Added compatibility behavior for existing/older raw log files:
    • no scaling => continue parsing
    • scaling enabled but missing blob metadata => descriptive error
  • Added --abbrev=40 to the Git raw log command so blob hashes consistently match the index.
  • Added file size transition animation (interpolated node size changes).
  • Reworked scaled-node directory packing to reduce wobble/overlap in large clusters and avoid chain-like minima in smaller clusters.

Updated my OP to reflect the changes, also added a new video. Some behavior should maybe be changed? As in distance between directories (clusters)?

Might do a bit more testing but it seems pretty nice all in all right now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants