-
Notifications
You must be signed in to change notification settings - Fork 0
Home

- leftmost pane: the distribution(s) of distances:
- in green - the "baseline" distribution - taken from 4000 points uniform-randomly distributed in the unit square - by trial, all the distances between 4000 points seems to result in a smooth-enough-for-the-purpose distribution
- in blue, the distribution of distances between the "experimental" points (the ones in the middle pane)
- in red, the difference between the "experimental" distribution and the "baseline one"
- underneath, two (thin) progress bars showing the completion percentage of their respective distribution. A change in the "experimental" point set will trigger the re-computation for the "experimental" distribution and the progress will be shown by the blue progress bar. The same would go for the "baseline" one, except that the change of the "baseline point set" happens only in special circumstances (*maybe* explained later).
- mid pane - the "experimental point cloud positioning/distorting area". Note the presence of the "distortion hull" for the light-blue cluster of points in the lower right: dragging it's corners will distort the hull and, implicitly, the position of the points in the "cluster under edit"; use the mid-sides knobs on the hull to rotate it around the baricentre and the baricentre itself to translate the cluster in another position. (careful with the last one: you drop it outside the editing square, you won't be able to bring it back agaib - maybe in the future I'll look into this; if you so dropped your cluster, the solution is to delete and recreate it)
- right pane - "point collection editor" - you will be able to add/remove point clusters and specify their number/attributes, along with their corresponding "Adjust" checkbox - when ticked, this checkbox will cause the "distortion hull" become visible around the
If time will allow (and there'd be any interests from others), <<THIS will be replaced by a link to the user guide>>
The distribution of distances between two random points in a box (not necessary unit-box) is known as close formula up to three dimensions while at 4 and 5 dimensions (and, I assume, higher) don't.
Since the base code should work in higher dimensions without change, I went of the path of generating the "baseline" distribution the same way as the rest: by "experimentally" extracting the distance distribution of a random-uniform distribution of points within the (potentially n-dim) box.
But then... how many points in the "baseline point cluster" would be enough? Actually, the better question is "how many randomly chosen distances would result in a histogram close enough to the theoretical one"? Now, for the purpose of "exploring", getting an exact value (based on a "close enough" precision) would be an overkill, therefore I relied on the "eye test" - how many distances to consider to have a distance distribution which is smooth enough. Around 16·106 seems to be a maximal value for good enough (at least for 2D), the 4·106 is acceptable. The application allows the user to consider if s/he wants to compute the exhaustive set of all distances or only a certain count of them - there the checkbox/spinbox group in the upper-right.
Below, the screen captures for the exhaustive vs 16·6 for the about 24000 points (in 12 clusters) shown by the mid-panel:
-
All distances considered

-
Only 16M distances considered for the distribution

When I started this toy project, I sort of knew that with 2 clusters the distribution of distances is going to show a peak corresponding the the intra-cluster distances and another one corresponding to inter-cluster distances, but I couldn't imagine exactly what the distance distribution will show in between or how it would look for more than 2 clusters. This was the very purpose of putting together this application. For any others curious enough to be interested but not to the point of compiling/running the app, some screen captures
| no cluster | 1 cluster ND (normal distributed) |
|---|---|
![]() |
![]() |
| 2 clusters ND | 3 clusters ND |
![]() |
![]() |
| 4 clusters ND | 5 cluster ND |
![]() |
![]() |
| 6 clusters ND | 7 cluster ND |
![]() |
![]() |
I wanted some code I can reuse in other (than Qt/2D) contexts, just in case the exploration turns something interesting and I'd need to get explore it further on non-GUI conditions.
The src/model contains the (templated) classes to do it. Notable:
- the
src/model/model.hppclasses:
-
npoint<CoordType, Dim>- insrc/model/model.hpp. A N-dim point with coordinates expressed asCoordType, in fact just a typedef forEigen::template Matrix<C, 1, DIM, Eigen::RowMajor>(vector arithmetic already available, plus some other interesting ops I'll mention later - if I won't forget) -
npoint_grp<C, DIM>andnpoint_grp<C, DIM, Transform>- two templated classes to support the concept of "cluster of points". The first one is just a storage similar withstd::vector<npoint<...>>`, the second one will offers support for "transformation on-the-fly" (stores the points as a vector, but when asked for one, offers a transformed copy of the requested point) -
bound_npoint_cloud<class Supplier, typename CoordType, DIM>a pure virtual collection of "point clusters" (with type designated byclass Suppliertemplate parameter) with the (pure virtual)within_boundsmethod to tell if ant point supplied by the storedSupplierinstances are acceptable. The way to obtain the points is only by-copy, using thevoid points_copy(Container& dest) constmethod.
Most of the time, theSuppliers managed by this class will be instances ofnpoint_grp<C, DIM, ...>(or derived) but any other class implementing the requirements (briefly) documented in the class comments will simply do the job (compile time polymorphism) - the
bbox_npoint_cloud<Supplier, CoordType, DIM>is just abound_npoint_cloud<class Supplier, typename CoordType, DIM>with the bounds defined by a rectangular N-dim box; - the
histogram<CoordType>an abstract class modelling a histogram, with the derivedfixedl_histogram<CoordType>implementing support for histograms with lower/upper bounds known at creation type (fixed limits histogram);
-
src/model/dists.hpp- ah, yes, another example of compile-time polymorphism. Apart from thel2distance (providing the euclidian norm distance) there's also a support for Mahalanobis distance. The main difference between the two: whilel2is independent of any context to do the computations, themahalanobisclass will need to be initialized with the reference set of points to do the job, with thevoid update(const Supplier* src)method needing to be called before theoperator()(npoint..., npoint...)can return a proper distance. Asl2does not provide aupdate(const Supplier* src)method (not even a do-nothing one), the job of "updating" a distance is let to… (see next point) - the
src/model/proc.hppclasses
- … the
detail::struct dist_type_updater<SupplierType, DistType>is the class that handles (at compile type) theupdate(const SupplierType* src)action of updating theDistTypebased on the presence of absence of theupdate(...)method. The code that needs the services of a "distance" class only needs to call one of thedist_type_updater::update_dist(const Supplier& , dist_type&)ordist_type_updater::update_dist(const Supplier* , dist_type&)- if theDistTypeclass declares theupdate(const SupplierType* src)method, it will be invoked appropriately, if not the call intodist_type_updater::update_distwill result in a noop (kicked our at any non-trivial code optimisation); - the
class histogram_filler<typename CoordType, size_t DIM, class PointSupplier, class Observer>is the class that does the heavy lifting of filling in a histogram on a separate thread. The class provides astart(...)method (which will kick-of the job of filling in a histogram) and astop()method, which will make sure that the worker thread is stopped before finishing the job (if the job of filling the histogram is not complete). As, during the histogram filling, the caller may want to be notified of the progress and have partial results provided, thestart(...)will take anobservableparameter. The details are a bit more complicated, so maybe I'll cover them separately later.
- the
src/viewclasses - majority of them are just organizing the GUI flow. The interesting ones are:
- file
view/2d.hpp- theqtrn_adaptora class that packs/adapt aQTransformin a way thenpoint_grp<C, DIM, Transform>can consume. It is the way the "distortion hull" works on the "clusters positioning/distortion area" (mid pane of the GUI) acts on the points in a cluster - files
view/PointCluster[.hpp/.cpp]- thePointClusterclass that wraps/adapts anpoint_grp<C, DIM, Transform>to the Qt slot/signals - files
view/cloudmodel[.hpp/.cpp]- theCloudModelclass adapts abbox_npoint_cloud<Supplier, CoordType, DIM>to the Qt slot/signals. It also acts as the "Model" of the GUI part (Model as the M in MVC).







