Skip to content

Comments

Update Wasmi to v2.0.0-beta.1#6

Open
Robbepop wants to merge 1 commit intofirefly-zero:mainfrom
Robbepop:rf-update-wasmi-to-v2.0.0-beta.1
Open

Update Wasmi to v2.0.0-beta.1#6
Robbepop wants to merge 1 commit intofirefly-zero:mainfrom
Robbepop:rf-update-wasmi-to-v2.0.0-beta.1

Conversation

@Robbepop
Copy link
Contributor

@Robbepop Robbepop commented Feb 19, 2026

cc @orsinium

Note: Wasmi v2.0.0-beta.1 really is just a beta version. Not actually production ready. Though I would be very interested how this Wasmi version performs on your firefly-zero hardware.

Be warned. Though, I would love to know if and how this new Wasmi version changes performance for the firefly-zero project so that I can apply necessary changes and optimizations before releasing the stable Wasmi v2.0.0.

Furthermore, due to internal interpreter architecture changes it is necessary to always build this version of Wasmi with at least

opt-level = 2
codegen-units = 1

even in dev mode. The reason for this is that otherwise sibling-calls won't get optimized and you get a stack-overflow upon execution.

Unfortunately, due to

[patch.crates-io]
firefly-hal = { path = "../firefly-hal" }

It was a bit messy to check if this version still compiled, but according to my tests it does.

Comment on lines +23 to +25
[profile.dev.package.wasmi]
opt-level = 2
codegen-units = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I need to also set it in all projects that use the crate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the answer myself: Yes 🙃

thread 'main' (733177) has overflowed its stack
fatal runtime error: stack overflow, aborting
[1]    733177 IOT instruction (core dumped)  cargo run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no, it doesn't help, I still get Stack Overflow in the emulator even with optimizations enabled.

Copy link
Contributor Author

@Robbepop Robbepop Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an extremely good question. I'd say it is mostly important for the final binary.
For testing you could also use Wasmi's new portable-dispatch crate feature. It executes way slower, but at least it works universally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try:

[profile.release]
opt-level = 3
codegen-units = 1

in your emulator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that as well, same result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last option is to enable Wasmi's portable-dispatch crate feature. Execution performance will likely suffer but at least it is universally applicable.

@orsinium
Copy link
Member

It was a bit messy to check if this version still compiled, but according to my tests it does.

Yeah, sorry about that. It's the latest version in the git repo.

@orsinium
Copy link
Member

I would love to know if and how this new Wasmi version changes performance

I have some benchmarks for the device but all of them so far are focused on my implementation of host functions rather than the interpreter performance, since the latter is out of my control. I can hold on to merging the changes and later make some simple "CPU-heavy" benchmarks if you want.

@Robbepop
Copy link
Contributor Author

Robbepop commented Feb 19, 2026

I would love to know if and how this new Wasmi version changes performance

I have some benchmarks for the device but all of them so far are focused on my implementation of host functions rather than the interpreter performance, since the latter is out of my control. I can hold on to merging the changes and later make some simple "CPU-heavy" benchmarks if you want.

I really don't want to cause you even more work than already is on your plate. But it would certainly be very nice to know upfront, how future Wasmi versions behave, so that I can influence that. That way you kinda have control over the engine again. ;)

I plan some more significant optimizations for future Wasmi version. So a set of compute benchmarks for your hardware would really be nice.

@orsinium
Copy link
Member

I have published a new release, it now uses firefly-hal, so it should work out-of-the-box locally for you.

@Robbepop Robbepop force-pushed the rf-update-wasmi-to-v2.0.0-beta.1 branch 2 times, most recently from f237f64 to 3102c30 Compare February 19, 2026 20:15
@Robbepop Robbepop force-pushed the rf-update-wasmi-to-v2.0.0-beta.1 branch from 3102c30 to bbf6f06 Compare February 19, 2026 20:17
@Robbepop
Copy link
Contributor Author

Sorry, I messed up git rebase because I forgot to re-sync my fork with upstream. Had to delete history.

@Robbepop
Copy link
Contributor Author

Robbepop commented Feb 19, 2026

@orsinium Can you try the following Cargo profile in your emulator and see if there are still stack overflows for release builds?

[profile.release]
opt-level = 3
codegen-units = 1

If there are still stackoverflow issues, can you tell me the name of the target architecture so I can at least build for it and look at the assembly of Wasmi?

@Robbepop
Copy link
Contributor Author

Okay I have made some research: From docs I can see that you are using Espressif chips.

Therefore either Xtensa or RISC-V based ones.

  • xtensa-esp32-none-elf
  • xtensa-esp32s2-none-elf
  • xtensa-esp32s3-none-elf
  • riscv32imc-unknown-none-elf
  • riscv32imac-unknown-none-elf

The RISV-V based ones properly support tail calls and LLVM also should have no issues dealing with them.
However, it seems that LLVM's implementation for Xtensa based chips is very incomplete or limited and that sibling-tail-call optimization there is unreliable.

@orsinium
Copy link
Member

Therefore either Xtensa or RISC-V based ones.

We're on ESP32-S3 now, which is Xtensa. We want to move to ESP32-P4, which is RISC-V, but the native Rust bindings don't support P4 (and there is no plan to support it in 2026), so IDK yet if we managed to do that.

@Robbepop
Copy link
Contributor Author

Therefore either Xtensa or RISC-V based ones.

We're on ESP32-S3 now, which is Xtensa. We want to move to ESP32-P4, which is RISC-V, but the native Rust bindings don't support P4 (and there is no plan to support it in 2026), so IDK yet if we managed to do that.

Then enabling the portable-dispatch crate feature is the way to go. Just make sure performance of v2 is not worse than v1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants