MySQL Clustering

Implementing MySQL Loadable Function for clustering algorithm including kmeans, hierarchical clustering, and DB scan, which allows these clustering methods can be used right in the MySQL.

If you find it useful, please give us a star or fork it:

Function

KMEANS(int k, int max_iter, double features...)
  Input arguments:
   k       = number of observables
   max_iter  = max iteration

Hierar_cluster(int n, string method, double features...)
   Input arguments:
   n       = number of observables
   method  = cluster metric: "single", "complete", "average", "median"

DBSCAN(double epsilon, int minPoint, double features ...)
   Input arguments:
    epsilon   = radius of query region
    minPoint  = minima num. of the points in a query region

Usage

Create

Run following SQL queries to active the loadable functions:

CREATE AGGREGATE FUNCTION KMEANS RETURNS STRING
  SONAME 'clustering.dll';
CREATE AGGREGATE FUNCTION DBSCAN RETURNS STRING
  SONAME 'clustering1.dll';
CREATE AGGREGATE FUNCTION Hierar_cluster RETURNS STRING
  SONAME 'clustering.dll';

Create SQL procedure functions to combine the results with selected table:

DELIMITER $$
CREATE FUNCTION SPLIT_STR(
  x LONGTEXT,
  delim VARCHAR(10),
  pos INT
)
RETURNS LONGTEXT DETERMINISTIC
BEGIN 
    RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');
END$$
DELIMITER ;

DELIMITER #
CREATE PROCEDURE `cluster`(
	IN `STRING` LONGTEXT,
    IN `T` VARCHAR(1000))
BEGIN
DECLARE v_max INT UNSIGNED DEFAULT 1;
DECLARE v_counter INT UNSIGNED DEFAULT 0;
SET SQL_SAFE_UPDATES = 0;
SET @v_counter = 1;
SET @TABLE1 = `T`;
SET @VAR = NULL;
SET @C = 0;
SET @s = CONCAT('SELECT COUNT(`ID`) INTO @C FROM ', @TABLE1); 
SET @R = (SELECT table_name FROM information_schema.tables WHERE table_name=`T`);
SELECT @R;
PREPARE stmt1 FROM @s; 
EXECUTE stmt1;
SET v_max = @C +1;
DEALLOCATE PREPARE stmt1;
  START TRANSACTION;
  IF NOT EXISTS(SELECT NULL
            FROM INFORMATION_SCHEMA.COLUMNS
           WHERE table_name = @TABLE1 AND
           column_name = 'cluster') THEN
	SET @d = CONCAT('ALTER TABLE ', @TABLE1, ' ADD COLUMN cluster INT');
    PREPARE stmt2 FROM @d; 
	EXECUTE stmt2;
    DEALLOCATE PREPARE stmt2;
  END IF;
  WHILE @v_counter < v_max DO
	SET @VAR = (SELECT SPLIT_STR(`STRING`, ',', @v_counter));
    /*SELECT @VAR := (SELECT SPLIT_STR(`STRING`, ',', @v_counter));*/
    SET @e = CONCAT('UPDATE ', @TABLE1, ' SET cluster = @VAR WHERE ID = @v_counter');
    PREPARE stmt3 FROM @e; 
	EXECUTE stmt3;
    SET @v_counter=@v_counter+1;
    DEALLOCATE PREPARE stmt3;
  END WHILE;
SET @f = CONCAT('SELECT * FROM ', @TABLE1);
    PREPARE stmt4 FROM @f; 
	EXECUTE stmt4;
DEALLOCATE PREPARE stmt4;
END #
DELIMITER ;

Use functions to cluster data with selected features:

CALL cluster((SELECT KMEANS(5, 20, features_1, features_2, features_3)FROM table), 'table');
CALL cluster((SELECT Hierar_cluster(5, 'simple', features_1, features_2, features_3)FROM table), 'table');
CALL cluster((SELECT DBScan(3, 'average', features_1, features_2, features_3)FROM table), 'table');

Drop

Release loadable functions:

DROP FUNCTION IF EXISTS KMEANS;
DROP FUNCTION IF EXISTS Hierar_cluster;
DROP FUNCTION IF EXISTS DBScan;

Release procedure functions:

DROP FUNCTION SPLIT_STR;
DROP PROCEDURE cluster;

Algorithm

K-means

Hierarchicial clustering

source: https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html

DBSCAN

Installation

1. Using dll file

Please copy clustering.dll to the plugin_dir directory. If you can not find the plugin_dir, you can get the location of plugin_dir by running the following SQL queries:

select @@plugin_dir

2. cmake

If method 1 does not work, please follow the below steps to build dll file. Ensure the CMake(v2.6 or above) build utility is installed.
Build a directory to store output files:

mkdir build
cd build

Run the commands with your generator (ex. visual studio 16 2019) in the base directory:

cmake -G "generator" ..
devenv clustering.sln /build Release

This will build the MySQL library file clustering.dll in the directory Release. Please copy clustering.dll to the plugin_dir directory. If you can not find the plugin_dir, you can get the location of plugin_dir by running the following SQL queries:

select @@plugin_dir

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
algorithm		algorithm
img		img
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
clustering.cc		clustering.cc
clustering.def		clustering.def
clustering.dll		clustering.dll
example.sql		example.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MySQL Clustering

Function

Usage

Create

Drop

Algorithm

K-means

Hierarchicial clustering

DBSCAN

Installation

1. Using dll file

2. cmake

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MySQL Clustering

Function

Usage

Create

Drop

Algorithm

K-means

Hierarchicial clustering

DBSCAN

Installation

1. Using dll file

2. cmake

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages