Merge rv32im opcode with similar functionality into single chip, which benefits - less kernel invoked - less smaller layer sumcheck which in particular aim for improve gpu utilization ### opcode group - [ ] all load/store opcodes - [ ] branching - [ ] logical