Farthest point sampling method#13
Conversation
This method is a filter for the training-set and can be used to reduce the training-set size easily. It chooses those N structures that are farthest from each other in terms of distances in sorted symmetry functions per structure.
This reverts commit 5aa03fe. revert
can be used to reduce the training-set size easily. It chooses those N structures that are farthest from each other in terms of distances in sorted symmetry functions per structure.
added nnp-fpssampling support
- Merge branch 'master' of github.com:CompPhysVienna/n2p2 - Rename tool to nnp-fps - Adapt coding style, minor code changes - Internal log file usage - Added CI tests for tool - Updated test_nnp.h to asynchronous IO and std::future buffer (long output buffers would not fit in ipstream?) - Added (unfinished) documentation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13 +/- ##
==========================================
+ Coverage 58.33% 60.56% +2.22%
==========================================
Files 78 70 -8
Lines 11380 10040 -1340
==========================================
- Hits 6639 6081 -558
+ Misses 4741 3959 -782
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I have a little difficulty understanding the reasoning behind the ordering of because it shuffles completely the order of symmetry functions and atoms... I have to think about it... |
|
I think it would be reasonable to move the farthest point sampling from the level of structures to the atomic level, i.e. compare not combined symmetry function vectors of entire structures but rather atomic environment fingerprints of individual atoms. |
The method selects those N structures form the training-set that are farthest from each other in terms of a distance norm in symmetry function values. To this end, it collects all symmetry function values for a single structure into a vector "allG", sorts allG, and calculates the distance 1/n*|allG[i]-allG[j]| for structures with the same number of atoms and elements.
This application was inspired by
https://doi.org/10.1063/1.5024611