Machine learning using decision tree algorithm in python

Decision tree algorithm is very basic method to predict result of given input using the pre-known data (knowledge) . Let us see how this can be used in machine learning to do predictions.

To start with, you need python scikit-learn module


aptitude install gfortran libatlas-base-dev libopenblas-dev liblapack-dev
pip3 install scikit-learn --index-url https://piwheels.org/simple

Think of a problem statement…say predicting a domestic animal with its height, length and weight.

Lets think of cats, dogs and cows.

Prepare a data set of few samples, say 3 cats, 4 dogs, 4 cows (more samples, better is the prediction).
Label each data sample with mammal you have recorded. Also, assign number representation to animals. say “0=cat, 1=dog, 2=cow”. Like:

AnimalSize = [[20,40,3.5],[24,48,4.3],[23,38,4.5], [40,80,21.4],[45.5,90.4,25],[55,100,28],[60,110,30], [110,150,350],[130,180,400],[140.5,220,500],[145,210,510]]

animals = [0,0,0,1,1,1,1,2,2,2,2]

Code that guesses species:


#!/usr/bin/python3

import sklearn
from sklearn import tree

# Size list of height, lenght, weight
AnimalSize = [[20,40,3.5],[24,48,4.3],[23,38,4.5], [40,80,21.4],[45.5,90.4,25],[55,100,28],[60,110,30], [110,150,350],[130,180,400],[140.5,220,500],[145,210,510]]

# 0=cat, 1=dog, 2=cow
animals = [0,0,0,1,1,1,1,2,2,2,2]

clafr = tree.DecisionTreeClassifier()
clafr = clafr.fit(AnimalSize, animals)

AniInp = input("What size of animal you saw ? (height, length, weight):")

ans=clafr.predict([AniInp.split(",")])

if ans == 0:
   print("cat")

if ans == 1:
   print("dog")

if ans == 2:
   print("cow")

Lets ask above program to make predictions


./animal.py
What size of animal you saw ? (height, length, weight):20, 35, 5
cat

./animal.py
What size of animal you saw ? (height, length, weight):40, 80, 35
dog

./animal.py
What size of animal you saw ? (height, length, weight):100, 200, 420
cow

You can further improve prediction of above program by confirming its predictions and feeding back your inputs to pre-known data ie knowledge.

Now guess why social media sites ask their users to tag photos, locations and use hashtags đŸ™‚ ? They need data, more and more data to improve their knowledge and build intelligence on top of this data.