-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
./bin/SimAI_m4 -w ./example/microAllReduce.txt -n ./Spectrum-X_128g_8gps_100Gbps_A100.txt
It would cause a core dump error:
terminate called after throwing an instance of 'c10::IndexError'
what(): select(): index 50400 out of range for tensor of size [50000] at dimension 0
Exception raised from select_symint at ../aten/src/ATen/native/TensorShape.cpp:1845 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7effbf460f86 in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x11606df (0x7effa25bd6df in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x2e2c393 (0x7effa4289393 in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x2e2e099 (0x7effa428b099 in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::_ops::select_int::redispatch(c10::DispatchKeySet, at::Tensor const&, long, c10::SymInt) + 0xc5 (0x7effa3e77c05 in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x50e26bc (0x7effa653f6bc in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x50e2aec (0x7effa653faec in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: at::_ops::select_int::redispatch(c10::DispatchKeySet, at::Tensor const&, long, c10::SymInt) + 0xc5 (0x7effa3e77c05 in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x4a00749 (0x7effa5e5d749 in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x4a00edc (0x7effa5e5dedc in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::select_int::call(at::Tensor const&, long, c10::SymInt) + 0x1ad (0x7effa3eded7d in /home/ma/m4/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x1ec5c (0x55df7fe91c5c in ./bin/SimAI_m4)
frame #12: <unknown function> + 0x21546 (0x55df7fe94546 in ./bin/SimAI_m4)
frame #13: <unknown function> + 0x37fb9 (0x55df7feaafb9 in ./bin/SimAI_m4)
frame #14: <unknown function> + 0x1e20b (0x55df7fe9120b in ./bin/SimAI_m4)
frame #15: <unknown function> + 0x1e3b0 (0x55df7fe913b0 in ./bin/SimAI_m4)
frame #16: <unknown function> + 0x1ea08 (0x55df7fe91a08 in ./bin/SimAI_m4)
frame #17: <unknown function> + 0x1b544 (0x55df7fe8e544 in ./bin/SimAI_m4)
frame #18: __libc_start_main + 0xf3 (0x7eff6bd21083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #19: <unknown function> + 0x1d43e (0x55df7fe9043e in ./bin/SimAI_m4)
And I find It is caused by this parameter "n_flows_max" in M4.cc
The gdb history is as follows:
gdb.txt
So if I set a higher value(such as 200000), are there any side effects??
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels