cpu: rv64: share RVV eltwise emitters#9
Closed
Ga1axy0 wants to merge 1 commit into
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds RVV JIT support for additional RV64 eltwise forward algorithms and factors shared eltwise code generation into a reusable emitter helper.
The new emitter is used by the RV64 eltwise JIT kernels and by the f16 softmax exp-sub-sum kernel through
elt.exp(). This avoids duplicating the RVV exp polynomial sequence in softmax while keeping softmax kernels as regularjit_generator_tusers.The emitter is intended to cover the regular finite-input fast path. It does not classify or preserve special NaN/Inf values on its own. Some of the newly added algorithms, including
exp,tanh, andgelu_tanh, apply explicit lower/upper bounds with RVV min/max instructions. Existing clamp-based eltwise algorithms such ashardsigmoidandclipalso use min/max-style clamping without explicit special-value fixups.If special NaN/Inf preservation is required for these paths, it likely needs to be discussed as a unified RVV JIT eltwise policy rather than handled only for the newly added bounded emitters. Adding per-lane special-value detection and fixup in the kernel would introduce extra comparisons, masks, and merges on the eltwise hot path, which may affect the overall performance of these kernels.
The added/updated eltwise coverage includes:
tanhlogisticswishelugelu_tanhgelu_erfexpValidation
Benchdnn validation was run on a local RV64 environment with the
option_set_all_algscase list for f32 and f16:Result notes:
tests:1274 passed:1120 skipped:154 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0tests:1274 passed:1106 skipped:168 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0Invalid caseentries, such as invalid alpha/beta combinations forelu_dst,relu_dst,clip,clip_v2, andclip_v2_dst; f16 also skipsround.tests:32 passed:32 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0Checklist
General
make testandmake test_benchdnn_*) pass locally for each commit?Notes
To reuse the emitter from another RV64 JIT kernel:
cpu/rv64/jit_rvv_eltwise_emitter.hppand createjit_rvv_eltwise_fwd_emitter_t elt(this)inside ajit_generator_tsubclass.alpha,beta,zero,one), floating temporaries, and integer temporaries, then pass them througheltwise_aux_regs_t. The actual source and destination vectors are passed explicitly to each helper.elt.exp(regs, v_dst, v_src)orelt.gelu_tanh(regs, v_dst, v_src). Helpers write the selected destination vector and may freely use the scratch registers listed ineltwise_aux_regs_t.Softmax f16 example:
Here softmax provides the surrounding algorithm semantics: it subtracts the row maximum before calling
elt.exp(), accumulates the exponentials, and stores the temporary f32 values for the later normalization pass.