Skip to content

[Branch Hinting] Add binary support#7572

Merged
kripken merged 74 commits into
WebAssembly:mainfrom
kripken:branch.hint.binary
May 8, 2025
Merged

[Branch Hinting] Add binary support#7572
kripken merged 74 commits into
WebAssembly:mainfrom
kripken:branch.hint.binary

Conversation

@kripken
Copy link
Copy Markdown
Member

@kripken kripken commented May 6, 2025

No description provided.

@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 6, 2025

Note: I have yet to confirm the binary format here interops with others, see WebAssembly/branch-hinting#31

@kripken kripken marked this pull request as draft May 7, 2025 15:41
@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 7, 2025

Converting to draft as the order was wrong here: unlike DWARF, this section must appear before the code. I did that in the reader, but doing it in the writer will take some interesting plumbing...

@tlively
Copy link
Copy Markdown
Member

tlively commented May 7, 2025

Since we'll apparently have to assemble the output in chunks, maybe it's a good opportunity to investigate parallelizing binary writing for functions.

@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 7, 2025

Yeah, that can make sense after #7575 which allows us to work on side buffers - we could emit functions in parallel ones.

@kripken kripken marked this pull request as ready for review May 7, 2025 18:54
@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 7, 2025

This is now ready, but includes parts of #7575 which should land first.

Comment thread src/wasm/wasm-binary.cpp Outdated

auto funcIter = binaryLocations.functions.find(func);
assert(funcIter != binaryLocations.functions.end());
auto funcDeclarations = funcIter->second.declarations;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this funcOffset instead? That will be clearer when we subtract it from the expr offset below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the comment so it is right above this code, and I think now makes it explanatory? I.e. the code is now doing literally what the comment says, "take the offset relative to function declarations"

Comment thread src/wasm/wasm-binary.cpp Outdated
Comment on lines +1911 to +1916
if (DWARF) {
// Do not break, so we keep looking for DWARF.
continue;
} else {
break;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this conditional selection between continue and break is worth the complexity. Can we just unconditionally continue here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it means we pre-scan through the entire binary in the common case, but I measured it now and the difference is maybe 1%, so I agree it's not worth it. Done.

@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 7, 2025

(rebased, this no longer has parts of another PR)

Comment thread src/wasm/wasm-binary.cpp
Comment on lines +489 to +491
std::move_backward(&o[sectionStart - 1], &o[oldSize], o.end());
std::copy(
annotationsBuffer.begin(), annotationsBuffer.end(), &o[sectionStart - 1]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler if we wrote sections to several buffers and then concatenated them at the end rather than shifting contents around within buffers. That would be a bigger change, though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and possibly less efficient (though I'm not sure, maybe reallocating a single buffer is worse?)

Comment thread src/wasm/wasm-binary.cpp Outdated

// We compute the location of the function declaration area (where the
// locals are declared) the first time we need it.
BinaryLocation funcDeclarations = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this funcDeclarationsOffset or similar so it makes more sense when we subtract it from exprOffset below?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

Comment thread src/wasm/wasm-binary.cpp Outdated
}
}

if (!funcHints.exprHints.empty()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's turn this into an early exit:

Suggested change
if (!funcHints.exprHints.empty()) {
if (funcHints.exprHints.empty()) {
continue;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread src/wasm/wasm-binary.cpp Outdated
// Write the final size. We can ignore the return value, which is the number
// of bytes we shrank (if the LEB was smaller than the maximum size), as no
// value in this section cares.
(void)buffer.emitRetroactiveSectionSizeLEB(lebPos);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this void cast to avoid compiler errors? It doesn't seem like we should.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the return value isn't marked as erroring if unused, yeah, and the comment should be enough. Done.

Comment thread src/wasm/wasm-binary.cpp Outdated
readDylink(payloadLen);
} else if (sectionName.equals(BinaryConsts::CustomSections::Dylink0)) {
readDylink0(payloadLen);
} else if (sectionName.equals(Annotations::BranchHint.str)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both sides are Name, can't we use ==?

Suggested change
} else if (sectionName.equals(Annotations::BranchHint.str)) {
} else if (sectionName == Annotations::BranchHint) {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread src/wasm/wasm-binary.cpp
Comment on lines +5255 to +5256
auto funcIndex = getU32LEB();
auto& func = wasm.functions[funcIndex];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should guard against OOB indices here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

Comment thread test/lit/wat-annotations.wast Outdated

;; RUN: wasm-opt -all %s -S -o - | filecheck %s
;; RUN: wasm-opt -all %s -S -o - | filecheck %s
;; RUN: wasm-opt -all --roundtrip %s -S -o - | filecheck %s --check-prefix=BINARY
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we use RTRIP as the check prefix so it has the same number of characters as CHECK.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread src/wasm/wasm-binary.cpp
Comment on lines +1611 to +1613
// TODO: This mode may not matter (when debugging, code annotations are an
// optimization that can be skipped), but atm source maps cause
// annotations to break.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the breakage?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I only noticed it by chance, and added the TODO to investigate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I think we will need this to work. One use case for source maps is deobfuscation of stack traces collected from the wild, which means they need to be accurate (to the extent possible) for production binaries.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. I'll get to it sooner, then.

@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 8, 2025

(Meanwhile I am pretty sure this is correct wrt the binary format. The last difference with the spec interpreter turned out to be a bug on that side, WebAssembly/branch-hinting#34)

@kripken kripken merged commit f634373 into WebAssembly:main May 8, 2025
14 checks passed
@kripken kripken deleted the branch.hint.binary branch May 8, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants