Skip to content

Dawson-ma/MySQL-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MySQL Clustering

Implementing MySQL Loadable Function for clustering algorithm including kmeans, hierarchical clustering, and DB scan, which allows these clustering methods can be used right in the MySQL.

If you find it useful, please give us a star or fork it:

GitHub stars GitHub forks

Function

KMEANS(int k, int max_iter, double features...)
  Input arguments:
   k       = number of observables
   max_iter  = max iteration
Hierar_cluster(int n, string method, double features...)
   Input arguments:
   n       = number of observables
   method  = cluster metric: "single", "complete", "average", "median"
DBSCAN(double epsilon, int minPoint, double features ...)
   Input arguments:
    epsilon   = radius of query region
    minPoint  = minima num. of the points in a query region

Usage

Create

Run following SQL queries to active the loadable functions:

CREATE AGGREGATE FUNCTION KMEANS RETURNS STRING
  SONAME 'clustering.dll';
CREATE AGGREGATE FUNCTION DBSCAN RETURNS STRING
  SONAME 'clustering1.dll';
CREATE AGGREGATE FUNCTION Hierar_cluster RETURNS STRING
  SONAME 'clustering.dll';

Create SQL procedure functions to combine the results with selected table:

DELIMITER $$
CREATE FUNCTION SPLIT_STR(
  x LONGTEXT,
  delim VARCHAR(10),
  pos INT
)
RETURNS LONGTEXT DETERMINISTIC
BEGIN 
    RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');
END$$
DELIMITER ;

DELIMITER #
CREATE PROCEDURE `cluster`(
	IN `STRING` LONGTEXT,
    IN `T` VARCHAR(1000))
BEGIN
DECLARE v_max INT UNSIGNED DEFAULT 1;
DECLARE v_counter INT UNSIGNED DEFAULT 0;
SET SQL_SAFE_UPDATES = 0;
SET @v_counter = 1;
SET @TABLE1 = `T`;
SET @VAR = NULL;
SET @C = 0;
SET @s = CONCAT('SELECT COUNT(`ID`) INTO @C FROM ', @TABLE1); 
SET @R = (SELECT table_name FROM information_schema.tables WHERE table_name=`T`);
SELECT @R;
PREPARE stmt1 FROM @s; 
EXECUTE stmt1;
SET v_max = @C +1;
DEALLOCATE PREPARE stmt1;
  START TRANSACTION;
  IF NOT EXISTS(SELECT NULL
            FROM INFORMATION_SCHEMA.COLUMNS
           WHERE table_name = @TABLE1 AND
           column_name = 'cluster') THEN
	SET @d = CONCAT('ALTER TABLE ', @TABLE1, ' ADD COLUMN cluster INT');
    PREPARE stmt2 FROM @d; 
	EXECUTE stmt2;
    DEALLOCATE PREPARE stmt2;
  END IF;
  WHILE @v_counter < v_max DO
	SET @VAR = (SELECT SPLIT_STR(`STRING`, ',', @v_counter));
    /*SELECT @VAR := (SELECT SPLIT_STR(`STRING`, ',', @v_counter));*/
    SET @e = CONCAT('UPDATE ', @TABLE1, ' SET cluster = @VAR WHERE ID = @v_counter');
    PREPARE stmt3 FROM @e; 
	EXECUTE stmt3;
    SET @v_counter=@v_counter+1;
    DEALLOCATE PREPARE stmt3;
  END WHILE;
SET @f = CONCAT('SELECT * FROM ', @TABLE1);
    PREPARE stmt4 FROM @f; 
	EXECUTE stmt4;
DEALLOCATE PREPARE stmt4;
END #
DELIMITER ;

Use functions to cluster data with selected features:

CALL cluster((SELECT KMEANS(5, 20, features_1, features_2, features_3)FROM table), 'table');
CALL cluster((SELECT Hierar_cluster(5, 'simple', features_1, features_2, features_3)FROM table), 'table');
CALL cluster((SELECT DBScan(3, 'average', features_1, features_2, features_3)FROM table), 'table');

Drop

Release loadable functions:

DROP FUNCTION IF EXISTS KMEANS;
DROP FUNCTION IF EXISTS Hierar_cluster;
DROP FUNCTION IF EXISTS DBScan;

Release procedure functions:

DROP FUNCTION SPLIT_STR;
DROP PROCEDURE cluster;

Algorithm

K-means

Hierarchicial clustering


source: https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html

DBSCAN

Installation

1. Using dll file

Please copy clustering.dll to the plugin_dir directory. If you can not find the plugin_dir, you can get the location of plugin_dir by running the following SQL queries:

select @@plugin_dir

2. cmake

If method 1 does not work, please follow the below steps to build dll file. Ensure the CMake(v2.6 or above) build utility is installed.
Build a directory to store output files:

mkdir build
cd build

Run the commands with your generator (ex. visual studio 16 2019) in the base directory:

cmake -G "generator" ..
devenv clustering.sln /build Release

This will build the MySQL library file clustering.dll in the directory Release. Please copy clustering.dll to the plugin_dir directory. If you can not find the plugin_dir, you can get the location of plugin_dir by running the following SQL queries:

select @@plugin_dir

About

Implementing MySQL Loadable Function for clustering algorithm including kmeans, hierarchical clustering, and DB scan, which allows these clustering methods can be used right in the MySQL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors