Regarding the coarse-grained workload generation, I noticed that the backward communication is currently set to NONE. After reviewing the discussion in #123 , we are considering replacing NONE with ALLREDUCE in the backpropagation phase and adding the corresponding communication size (e.g., tp_comm_size) to facilitate our ns-3 experiments on network congestion and potential solutions.
Is this a recommended approach to capture backward gradient sync traffic? Would this introduce any unintended side effects or logical conflicts within the SimAI simulation framework, such as double-counting issues with the grad_param_comm node?