Conversation
…ation.
1. Synchronization issues due to overlapping sources (atomics enabled by default)
2. Inconsistencies caused by changing ngsl in previous patch
3. Inconsistencies when using the DM.
commit 30a250af75a1802cf59b31d238e4a51874b9a3de
Author: Ossian O'Reilly <ossian.oreilly@gmail.com>
Date: Thu Oct 1 10:44:24 2020 -0700
fix test.
commit 609795374d4f755e9b6f6dfe8d9c822fa3085780
Author: Ossian O'Reilly <ossian.oreilly@gmail.com>
Date: Thu Oct 1 10:39:25 2020 -0700
add option to toggle partitioning mechanism depending on if it is a source or receiver.
commit a28d36ecb6d9889fff87f6cb5079acc187244c97
Author: Ossian O'Reilly <ossian.oreilly@gmail.com>
Date: Wed Sep 30 23:37:19 2020 -0700
Fix DM source receiver parallel bug. Now, mpi-distribute test is broken.
commit 2ba3fca0f26dd2d21ab7760d3629b5de7475f44c
Author: Ossian O'Reilly <ossian.oreilly@gmail.com>
Date: Wed Sep 30 22:33:20 2020 -0700
Modify grid bounds check used by source distribution (increase width by 2 points in each direction). 100 x 100 sources work to rounding error. Exact results if interpolation is disabled because then all the overlapping sources add the same value and then the order doesn't matter.
commit 9fdbad904c5891658b7bda1ca21a753a31eb534f
Author: Ossian O'Reilly <ossian.oreilly@gmail.com>
Date: Wed Sep 30 20:39:14 2020 -0700
fix parallel bugs that breaks the source implementation on a single grid and even when interpolation is disabled. This point may have introduced parallel bugs in the receivers. Interpolation not yet working.
commit 7cf4995b41d431ccaa2f4aca07f49fd52b39d1e8
Author: Ossian O'Reilly <ossian.oreilly@gmail.com>
Date: Wed Sep 30 17:13:32 2020 -0700
add atomic operations to sources to prevent parallel sync issues when the sources are overlapping.
…s in the source partitioning function.
hzfmer
left a comment
There was a problem hiding this comment.
I heard duplicate or overlapping subfaults issue is solved in this source read approach. I am not sure how/in which commit it was done. Hopefully the change I pointed out won't affect that (I did some preliminary tests, which seems promising) .
There is, however, another bug, that if duplicate receivers are provided, the code freezes at writing the first receiver output file (it doesn't finish the first "x" output file, not even create "y" and "z" files).
| nlocal++; | ||
| break; | ||
| case DIST_INSERT_INDICES: | ||
| (*indices)[j] = i; |
There was a problem hiding this comment.
I suppose this should be (*indices)[i] = j;, which means updating source/receiver i with the number of previously seen source/receivers j.
There was a problem hiding this comment.
First the code counts the number of sources/receivers that correspond to a particular partition, call the length nidx. Then this function gets called again to fill an array of length nidx that contains each index of a source/receiver for a given partition. So for each partition, we have an array that contains the indices of the input query points. For example,
qx = [0.0, 1.0, 1.0, 0.0]
qy = [0.0, 0.0, 1.0, 1.0]
Let's call q = (qx, qy). Suppose that we have two partitions (p0 and p1), obtained by splitting at x = 0.5. Then q0 and q3 belongs to p0, whereas q1 and q2 belong to p1. In this case, we get for p0 that indices = [0, 3], and indices = [1, 2] for p1.
Thank you for noticing that duplicate receivers causes the code to freeze. I will look into that today.
There was a problem hiding this comment.
Maybe I have misunderstood the code, but I thought the indices are stored in an array src->indices, whose length is src->length; it's not a 2D array, like array[ngrids][nidx]. Take you example, we will see the following:
// for (int j = 0; j < ngrids; ++j) called the function "dist_indices" the second time, with src-length = 4 determined.
// grid partition = 0, j starts from 0, and pick i = 0, 3
src->indices[0] = 0;
src->indices[1] = 3;
// grid partition = 1, j starts from 0 again, and pick i = 1, 2
src->indices[0] = 1;
src->indices[1] = 2;Then, we seem to be unable to locate the indices for src[2], src[3], which leads to segmentation fault, since src-indices[2:4] are not initialized. Please correct me if I mistakenly understand anything.
I have attached a python script to set up a small test, which I found the code failed due to segmentation fault (somehow I cannot upload files here..):
http://sendanywhe.re/Q35W2SCO
Simply run python setup_test.py, and modify executable path if you want have a try.
There was a problem hiding this comment.
In topography/sources/source.c and in the function source_init_common you will see that src->length gets set to the number of sources that this partition owns. In this case, 2.
Thanks for providing a test, I will see if I can run it and reproduce the problem on my end.
There was a problem hiding this comment.
@hzfmer I can't run your script.
return np.pad(resample(stf.squeeze(), n), (0, ntotal - n)).astype('float32')
NameError: name 'resample' is not defined
There was a problem hiding this comment.
did you try
pmcl3d `cat param.sh`
? that's what I'm using to run it interactively on my machine.
There was a problem hiding this comment.
Sure! I have set up a zoom meeting:
https://SDSU.zoom.us/j/5435014827?pwd=UVJpWE1GNnE5dTJEaEVkdGhZWmJYQT09
Please feel free to join anytime when you are available.
There was a problem hiding this comment.
Nice! just give me 15 mins here.
There was a problem hiding this comment.
It seems there leaves another problem, which I found hard to track. With some investigation, I found it unrelated with the size of output. Actually It can be reproduced by inserting just three receivers, which are placed in partition #0, #1, #0, respectively.
Specifically, this doesn't work
i=0, src_idx = 0 ( in the top block)
i=1, src_idx = 2 ( in the bottom block)
i=2, src_idx = 1 ( in the top block)While this works:
i=0, src_idx = 0 ( in the top block)
i=1, src_idx = 1 ( in the top block)
i=2, src_idx = 2 ( in the bottom block)The error message is
MPI error: /ccs/home/hzfmer/scratch/awp/src/mpi/io.c:mpi_io_idx_write():92 0: MPI_ERR_TYPE: invalid datatype
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 3.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.If you would like to reproduce it, just insert three lines below coordinates in the receiver.txt, and optionally change length to 3.
0 0.0 0.0 0.0
0 0.0 0.0 -1000.0
0 0.0 0.0 -15.0
There was a problem hiding this comment.
Nice job reducing the problem down to a minimum example! I'm sorry but I don't have access to your test code anymore. Could you share it with me again please? you can email me. Also, do I need the latest changes we made to the code to reproduce the error? if so, please push them to this branch so that I can access them. Thanks.
|
Yes, since the previous patch did fix the bug we found in Pull Request 13 (I have pushed it to the remote “fixes” branch (af2d0b4)), we should run tests on this patch.
I have attached the script to set up the tests, and as I mentioned, inserting a few lines in “input/receiver.txt” should reproduce the problem.
Thanks,
Zhifeng
…On Nov 23, 2020, 17:06 -0800, ooreilly ***@***.***>, wrote:
@ooreilly commented on this pull request.
In src/mpi/distribute.c:
> grid_numbers[i] == grid_number) {
- idx[j] = i;
- j++;
+ printf("i = %ld found in block %d \n", i, grid_number);
+ switch (mode) {
+ case DIST_COUNT:
+ nlocal++;
+ break;
+ case DIST_INSERT_INDICES:
+ (*indices)[j] = i;
Nice job reducing the problem down to a minimum example! I'm sorry but I don't have access to your test code anymore. Could you share it with me again please? you can email me. Also, do I need the latest changes we made to the code to reproduce the error? if so, please push them to this branch so that I can access them. Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
|
@hzfmer old link to your test code doesn't work anymore. I also don't see any attachments. Where should I look? |
…he cell-centered grid points.
… point with no interpolation.
…ize the amount of overlap.
DM compatible source and receiver placement and test. Convergence test and fixes to some broken tests. Point force implementation for the original AWP boundary condition. * fix incorrect placement of density and incorrect interpolation of density in force computation. * begin mms module. * add mms files. * fix density interpolation. * comment out work in progress code. * override material properties. * initialize mms solution. * add forcing functions. * Use plane wave solutions. * remove unused libraries, duplicate error header file, and receiver input file to mms module. * delete unused kernels. * remove block size from definitions.h and remove this dependency from some of the files. * update mms so that uses both P and S plane waves. * Fix kernel generator so that it runs with updated sympy version, and fix off by one errors in density interpolation. * Fix off by 0.5h error for source/receiver placement of components that are located at the cell-centered grid points in the z-direction. This fix only applies to the DM. * start adding point force suppport for the traditional bc. * fix placement of receivers on the free surface for regular AWP. * add surface force to original AWP. doesn't pass the reciprocity test. * Implement point forces in original AWP using stress perturbations. Passes the reciprocity test. * apply symmetry-based boundary conditions, and add body force. * start reintroducing convergence and energy tests. * interior convergence for the velocity field is working. * Cartesian velocity convergence tests shows second order convergence. Added option to enable/disable application of bc. * get interior stress convergence working. * convergence test might be working for a cartesian geometry. It is hard to tell in single precision. * add option to have double-precision topography files. * remove double-precision support because it doesn't work. * put the truncation error test as part of the unit test. Document the test. * add force extrapolation. * update grid printing. * remove broken test. * restore original AWP FS implementation. * add accuracy test data. * set c++ standard. * apply most recent mms changes. * add new grid implementation that maps the global z coordinate to the local block coordinate. * add arg to grid_fill1 that corrects for AWP's (-,+,+) choice. Currently, the correction is not applied. * remove shift in x-direction. Causes several tests to fail. * fix convergence test. * fix broken interpolation test. * fix broken grid_3d test. * fix broken metrics tests. * update dm source/receiver test so that it checks all components. * remove module grid_new. * change grid number identification process. * change to default AWP force implementation. * restore find_grid_number * remove old arch flags. Co-authored-by: Ossian O'Reilly <ooreilly@usc.edu>
Modifies the mapping in the overlap zone such that it is compatible with the DM. * fix incorrect placement of density and incorrect interpolation of density in force computation. * begin mms module. * add mms files. * fix density interpolation. * comment out work in progress code. * override material properties. * initialize mms solution. * add forcing functions. * Use plane wave solutions. * remove unused libraries, duplicate error header file, and receiver input file to mms module. * delete unused kernels. * remove block size from definitions.h and remove this dependency from some of the files. * update mms so that uses both P and S plane waves. * Fix kernel generator so that it runs with updated sympy version, and fix off by one errors in density interpolation. * Fix off by 0.5h error for source/receiver placement of components that are located at the cell-centered grid points in the z-direction. This fix only applies to the DM. * start adding point force suppport for the traditional bc. * fix placement of receivers on the free surface for regular AWP. * add surface force to original AWP. doesn't pass the reciprocity test. * Implement point forces in original AWP using stress perturbations. Passes the reciprocity test. * apply symmetry-based boundary conditions, and add body force. * start reintroducing convergence and energy tests. * interior convergence for the velocity field is working. * Cartesian velocity convergence tests shows second order convergence. Added option to enable/disable application of bc. * get interior stress convergence working. * convergence test might be working for a cartesian geometry. It is hard to tell in single precision. * add option to have double-precision topography files. * remove double-precision support because it doesn't work. * put the truncation error test as part of the unit test. Document the test. * add force extrapolation. * update grid printing. * remove broken test. * restore original AWP FS implementation. * add accuracy test data. * set c++ standard. * apply most recent mms changes. * add new grid implementation that maps the global z coordinate to the local block coordinate. * add arg to grid_fill1 that corrects for AWP's (-,+,+) choice. Currently, the correction is not applied. * remove shift in x-direction. Causes several tests to fail. * fix convergence test. * fix broken interpolation test. * fix broken grid_3d test. * fix broken metrics tests. * update dm source/receiver test so that it checks all components. * remove module grid_new. * change grid number identification process. * change to default AWP force implementation. * restore find_grid_number * remove old arch flags. * fix mms so that it works with grid stretching. * reintroduce mapping so that grid stretching function achieves zero at the overlap zone. * add mapping to vx. * add mapping to vy. * add mapping to xz, yz. * change overlap zone point. * add mapping to velocity buffers. Co-authored-by: Ossian O'Reilly <ooreilly@usc.edu>
* add new command line options. * add mapping module. * add mapping object to topography. * apply mapping in topography module. * rename mapping functions. * add mapping inversion that determines the parameter space coordinate given the physical coordinate. * fix DM grid stretching problem.
* fix serial reader test. * remove print statements. * fix write grid tool. * add DM and nonuniform grid stretching compatibility to the curvilinear grid writing tool. * set block size to its correct size. * add top and bottom grid spacing parameters.
* start fixing convergence test. * update convergence test to account for modified mapping. * fix mapping test. * remove print statement. * remove broken test.
Add energy rate output. * add energy command line args and empty module. * initialize energy and write dummy output. * add reduction kernel. * add all fields. * energy rate without boundary treatment. * energy conservation works for homogeneous material properties, no topography, no exterior boundaries. * add jacobians. * add random density. * add shear modulus. * add lambda parameter. * add mpi reduce. * fix off by one error in density interpolation in the x-direction. * add time vector.
…int force parallel consistency issue (#26) * fix incorrect placement of source/receivers in the y-direction when using multiple grid blocks. This fix Shifts sources/receivers in the y-direction by one grid spacing of the previous grid: y = y_prev - h_prev. * remove inactive code in source initialization. * Refactor source distribution (#27) * change y-axis for DM blocks instead of moving source/recv. * init source distribution. * modify partitioning of receivers so that it is easier to understand. * modify partitioning of sgts. * replace old partitioning kernels with new ones that uses grid indices instead of coordinates. * refactor distribute. * add grid block number to data init function so that it correctly generates y-values for DM blocks. * delete old partition functions. * cleanup partitioning functions and add guard statements to source kernels. * remove guard statements. * add default case. * update comment. * Fix parallel bug in point force insertion (#28) * add error code handling for mapping module * Fix parallel bug in point force insertion. * clean-up error handling in mapping. * Fix point force guards for Cartesian kernel and disable debugging output. * Refactor source distribution (#29) * change y-axis for DM blocks instead of moving source/recv. * init source distribution. * modify partitioning of receivers so that it is easier to understand. * modify partitioning of sgts. * replace old partitioning kernels with new ones that uses grid indices instead of coordinates. * refactor distribute. * add grid block number to data init function so that it correctly generates y-values for DM blocks. * delete old partition functions. * cleanup partitioning functions and add guard statements to source kernels. * remove guard statements. * add default case. * update comment. * add error code handling. * debug partitioning. * Fix parallel bug in point force insertion. The point force must be applied before the front/back send/recv kernels execute. * clean-up error handling in mapping. * Fix point force guards for Cartesian kernel and disable debugging output. * fix bounds guards for point force kernels.
* force source/recvs in the overlap zone to be placed on the coarser grid. * cleanup. * fix find_grid_number so that it maps surface coordinates to the top block.
… issue in write_grid tool with allocation of memory.
…on velocity (QSI, QPQSR, MAXVPVSR, VMIN, VMAX, and DMIN).
…hen topography is turned on.
Redesign cerjan function so that ND is no longer bounded by min(nxt,nyt)
Remove textures for CUDA 12+, Add compile scripts
This PR introduces the following fixes:
ngslfrom8to4Recommendations:
Please check that source propagates correctly across the DM overlap zone. Run two test simulations:
Check that it in each case that waves can propagate across the overlap zone with causing numerical artifacts.