Implementation of a new clustering algorithm #587
Malik-Hacini
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
As part of a paper submitted to the Machine Learning Research Journal, our team is working on integrating a new clustering algorithm into scikit-network.
The algorithm is a generalization of Spectral Clustering to directed graphs. The primary difference lies in the computation of new Laplacian matrices (currently referred to as "generalized" Laplacians). The rest of the pipeline remains largely the same: constructing a k-nearest neighbors graph, computing the spectral embedding of the laplacian , and assigning cluster labels (either kmeans, discretize or propagation)
The research is still ongoing—particularly in tuning base parameters—but the preliminary results are promising. Since our method encompasses classical Spectral Clustering as a special case, it performs at least as well, and often better, on directed data.
We’re currently exploring the best way to contribute this to this package. Given the structure of existing spectral methods, we believe the cleanest solution would be to implement a new estimator, and modify the existing spectral embedding to include the new laplacians. This estimator would be very similar to the SpectralClustering estimator of scikit-learn (due to similarity of the pipeline)
We're looking for guidance on a few fronts : suggestions from contributors familiar with the spectral module, performance optimizations, ...
Any help or feedback would be greatly appreciated!
Thank you in advance!
Beta Was this translation helpful? Give feedback.
All reactions