Decision Tree for all to see…

My last decision tree post ended with successfully building a tree model using the sklearn library. But just checking how well the (supervised) tree model predicts the known classification, that alone isn’t all I’m interested in. I want to see the tree and look at its structure. Fortunately, that’s not very difficult. What you need are two Python libraries.

  • graphviz
  • pydotplus

If they’re not already installed, it’s easy enough to do so. Although you might run into some trouble, as I did, by not installing them in the correct order. So I recommend the following:


(base) C:\WINDOWS\system32\conda install graphviz
Solving environment: done
[...]
The following NEW packages will be INSTALLED:
    graphviz: 2.38.0-4

(base) C:\WINDOWS\system32\pip install graphviz
Successfully installed graphviz-0.8.2

(base) C:\WINDOWS\system32\conda install pydotplus
Solving environment: done
[...]
The following NEW packages will be INSTALLED:
    pydotplus: 2.0.2-py36_0

(base) C:\WINDOWS\system32\pip install pydotplus
Requirement already satisfied: pyparsing>=2.0.1 in c:\programdata\anaconda3\lib\site-packages (from pydotplus)

If you’re getting an error, trying “import graphviz” in Python,…

InvocationException: GraphViz's executables not found

Then you need to add graphviz to your Windows Path (environmental variables). Once that’s done, you should be able to do…

#tree is: from sklearn import tree - see last post
with open("c:/tree.dot", 'w') as dotfile:
    tree.export_graphviz(clf, out_file = dotfile, feature_names = X.columns)

This writes the tree to a ‘dot’ file. This is actually an established format for this kind of information. DOT being a graph description language. To display the tree you can use the code the below. There are various ways to do this, and you don’t have to create a file first the way I did.

import pydotplus
from IPython.display import Image
import graphviz

graph = pydotplus.graphviz.graph_from_dot_file("c:\\tree.dot")
Image(graph.create_png())

This plots the following tree diagram to the screen. It’s a bit overwhelming in fact.
tree
If you want to save the file, you could do…

pydotplus.graph_from_dot_file(graph.write_png("c:\\tree.png"))

And Python drops a new PNG in your directory of choice:

dot_png_generated, then you add some color and rounded boxes…

There is all kinds of configuration to allow fine tuning your tree and visualization. If you do something like the below, inspired by statinfer.com
tree2

Read all about Graphviz/Pydotplus reference here – http://pydotplus.readthedocs.io/reference.html.

There are lots of links about building Decision Trees with Scikit-learn on the web. Here are is just a selection of those I found helpful:

Basic intro – https://pythonprogramminglanguage.com/decision-tree-visual-example/

Entire walkthrough, not unlike mine, plus random forests… http://adataanalyst.com/scikit-learn/decision-trees-scikit-learn/

Walkthrough that also discusses GINI – http://dataaspirant.com/2017/02/01/decision-tree-algorithm-python-with-scikit-learn/

This article talks about information gain and impurity, likely a topic for one of my next tree related posts –http://benalexkeen.com/decision-tree-classifier-in-python-using-scikit-learn/

Another example from hackernoon – https://hackernoon.com/a-brief-look-at-sklearn-tree-decisiontreeclassifier-c2ee262eab9a

Finally, a much more comprehensive discussion worth reading twice – http://stackabuse.com/decision-trees-in-python-with-scikit-learn/

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s