-
Notifications
You must be signed in to change notification settings - Fork 15
[feat] Kernel-level fusion #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…into kernel_fusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we use this file for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file can be used for testing with Polygeist as front-end, which can lower C++ to IRs in affine. Here the IRs after lowering are provided in test.mlir and kernel.cpp is only for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then can we really leverage Polygeist to compile it, instead of just a reference?
| #define START_PIPE_OCCUPY 1 // A multi-cycle op starts in the FU | ||
| #define END_PIPE_OCCUPY 2 // A multi-cycle op ends in the FU | ||
| #define IN_PIPE_OCCUPY 3 // A multi-cycle op is occupying the FU (pipelined) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the 3 *_PIPE_OCCUPY overlapping with each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually 3 means the multi-cycle op will not occupies the input and output ports of the tile so we can map other operations onto this tile, which is inclusive execution we proposed before in our DATE paper.
However, I have not finished the implementation and test of inclusive execution so far. Here I just copied some content from CGRA-Mapper. Will tune it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So IN_PIPE_OCCUPY does not include start and end, right?
FuseKernelPassis an MLIR optimization pass that mergesneura.kerneloperations.assign-accelerator → lower-arith → canonicalize → transform-ctrl-to-dataflow)MII_est = ⌈(1 + α×ops/tiles) × (1 + β×max(fanout-4, 0)) × max(RecMII, ResMII)⌉. Compares fused vs. separate execution: fusion only proceeds ifMII_fused ≤ max(MII_k1, MII_k2). Note that this equation is designed to take the resources and congestion (fanout) into consideration and stay tuned.An example is shown below, where the two loops are wrapped in the same kernel: