Skip to content

codespook/Decision-Tree-Builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decision Tree Builder

This is an implementation of Hunt's decision tree algorithm in Python. It uses the Gini index as it's impurity measure.

It takes as input a csv file of attribute tuples and their class labels in the following form --

Attribute 1, Attribute 2,.... Attribute n, Class Label
Attribute 1, Attribute 2,.... Attribute n, Class Label
Attribute 1, Attribute 2,.... Attribute n, Class Label
Attribute 1, Attribute 2,.... Attribute n, Class Label

Please note that both attributes and their associated class labels must be numeric (integers specifically)

For example an acceptable dataset might look like this --

1,1,1,2,2
1,1,1,1,2
2,1,1,2,1
3,2,1,2,1
3,3,2,2,1
3,3,2,1,2
2,3,2,1,1
1,2,1,2,2
1,3,2,2,1
3,2,2,2,1
1,2,2,1,1
2,2,1,1,1
2,1,2,2,1
3,2,1,1,2

The output decision tree will look like this -

|’root’
	|-‘child 1 of root’
		|--‘child of child 1’
		|--
		|--
		And so on...
	|-‘child 2 of root’
		|--‘child of child 2’
		|--
		|--
		And so on...
	|-
	And so on...

For example, the output produced by the above example dataset is the following --

|'Attribute 0'
	|-'Attribute 2'
		|--'Label 2'
		|--'Label 1'
	|-'Label 1'
	|-'Attribute 3'
		|--'Label 2'
		|--'Label 1'

This output can be visualized like as follows where 'att' stands for attribute and 'lab' stands for label --

Output of example dataset

DTree_Driver.py is the driver program that parses a specified input csv file and write the produced decision tree to a specified output text file.

Usage:

$ DTree_Driver.py input.csv output.txt

About

Hunt's Decision Tree Algorithm implementation in Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages